Adventures with Samba and KRBTGT

Last few days I was investigating a possible migration from our corporate ADS domain to Samba 4.13. This unsurprisingly ended up as a little adventure into Samba source code.

I’ve requested access to Samba bugzilla over this but they haven’t replied yet so I figured I’ll post about it here in an unlikely case someone else runs into this issue. My searching of error messages mentioned below was largely fruitless.


I deployed our corporate ADS domain in 2004, on Windows Server 2003. It runs successfully to this day, having been upgraded through Windows 2008, 2008R2, currently on 2012R2, and migrated over many domain servers, first physical, then virtual. Windows ADS in my experience is largely bulletproof but it’s nice to have alternatives available.

Anyway, my first problem was new Samba DC failing to join the domain because it couldn’t verify creation of necessary DNS records. This was luckily easy to patch. Samba simply creates missing DNS records later when you actually start it.

Everything worked seemingly well until I demoted and turned off Windows DCs. Then things became interesting: Samba couldn’t authorize anyone while complaining about Kerberos, no-one could authenticate nor access any network shares like SYSVOL, and my test environment fell apart.

The following errors were specifically logged:

Kerberos: samba_kdc_fetch: could not find KRBTGT number 1 in DB!
Kerberos: Ticket-granting ticket not found in database: no such entry found in hdb

That was surpising, why would it try to authenticate via a different nonexistent key?

Meanwhile, actual Kerberos server worked fine by itself, I could successfully authorize and obtain tickets via kinit, it was only the authorization through Samba that was broken:

dc2:~:$ kinit fox
Password for [email protected]:
dc2:~:$ klist
Ticket cache: FILE:/tmp/krb5cc_1000
Default principal: [email protected]

Valid starting     Expires            Service principal
10/29/20 16:01:58  10/30/20 02:01:58  krbtgt/[email protected]
        renew until 10/30/20 16:01:44

dc2:~:$ smbclient -k //DC2/SYSVOL
gse_get_client_auth_token: gss_init_sec_context failed with [ Miscellaneous failure (see text): The ticket isn't for us](2529638947)
gensec_spnego_client_negTokenInit_step: gse_krb5: creating NEG_TOKEN_INIT for cifs/DC2 failed (next[(null)]): NT_STATUS_LOGON_FAILURE
session setup failed: NT_STATUS_LOGON_FAILURE        

NTLM authorization was also working properly but it didn’t help much.

I got the source code, and added some debugging:

     krb5_warnx(context, "samba_kdc_fetch:KRBTGT: my %u vs req %u kvno %u pr %s rodc %u",
               (unsigned)(kdc_db_ctx->my_krbtgt_number), (unsigned)(krbtgt_number), (unsigned)kvno,
               realm_from_princ, (unsigned)kdc_db_ctx->rodc);

And got the following output:

Kerberos: samba_kdc_fetch: KRBTGT: my 0 vs req 1 kvno 100008 pr X.X.RU rodc 0

My first, wrong, guess was that Samba considered itself a RODC for some reason. Instead, this was entirely about the - very special - ADS user krbtgt, everything Kerberos-related in ADS revolves around this particular principal.

Samba gets this “KRBTGT number” based on KVNO of aforementioned user. KVNO means Key Version Number, its an integer that increments every time an entity has its password changed, so that authorizations acquired via previous passwords are invalidated. Unfortunately for me, this is how this number is calculated in Samba:

#define SAMBA_KVNO_GET_KRBTGT(kvno) \
	((uint16_t)(((uint32_t)kvno) >> 16))

I have no clue why this has been written this way, I’m sure Samba developers had their reasons, and this code does work, mostly. On our domain, however, several principals (sadly including krbtgt) had their KVNO ticking from 1000000 instead of 0:

# ldbsearch -H sam.ldb msDS-KeyVersionNumber | grep krbtgt -A1
dn: CN=krbtgt,CN=Users,DC=x,DC=x,DC=ru
msDS-KeyVersionNumber: 100008

Again, no idea why, maybe related to an older version of Windows, or possibly somehow to a multi-DC configuration, my uneducated guess would be the latter.

100008 >> 16 is 1 (instead of 0) which is a rightfully nonexistant key which prevents Samba from acquiring a ticket-granting ticket thus breaking Kerberos authentication entirely.

As far as I know, it’s impossible to rollback KVNO to a lower value Samba might accept. It’s a calculated property based on replPropertyMetaData which is a huge base64-encoded blob I didn’t bother to investigate.

So, the only other possible way to make this work was patching Samba so that impossibly high KVNO is brought back to a normalized value, by substracting 100000 when necessary:

1813       unsigned int krbtgt_number;
1814       /* w2k8r2 sometimes gives us a kvno of 255 for inter-domain
1815          trust tickets. We don't yet know what this means, but we do
1816          seem to need to treat it as unspecified */
1817       if (flags & SDB_F_KVNO_SPECIFIED) {
1818 +        if (kvno >= 100000) 
1819 +           kvno -= 100000; 
1820
1821          krbtgt_number = SAMBA_KVNO_GET_KRBTGT(kvno);

I didn’t have any principals with KVNO starting from 200000 but I can’t rule out their existence so this patch is definitely not universal. :slight_smile:

This allows Samba to acquire the TGT, which fixes Kerberos authentication, and its largely smooth sailing afterwards, with normal caveats like lack of DFS-R. On my test environment, all ADS clients started performing normally immediately after this was applied.

No idea of long term effects of this hack, because I currently have no plans to put Samba into production, but that’s a solution, of sorts. It could break in a RODC environment but meh, who uses those.

Maybe this will help someone else struggling with a 16 year old ADS domain.

I don’t know how much effort you want to put into this, but if you want to send a patch to them at least it’s documented:
https://wiki.samba.org/index.php/Contribute

I still haven’t received anything back from their address you need to email in order to be allowed to register on the bugzilla so I guess they weren’t interested and/or don’t care. :man_shrugging:

like i said above, i don’t think samba-based ads is good enough for production use, so i don’t really care if this gets fixed in upstream somehow. I was mostly trying to satisfy my curiosity, it’s rare when something doesn’t work and its not immediately obvious why.