Saturday 17 March 2007

LDAP Strikes again

Last night around 1Am i was called in @ work due to ' a few strange techie things happening here, Sid '.
Once got there figured out it was a lack of internal networking due.
For this to be thanked was our Microdowell UPS System holding the web/auth/smb machine.

This last pc is a Fedora 6 machine, which amongst many services runs also Ldap, for smb and mail authentication.

Trying to restart the machine, i was stuck forever on:

Starting System Message Bus ....


As i had the chance to figure out from the booting info, there was a previous failure to startup ldap.

At least to start the machine, the solution seemed to be getting in Runlevel 1 ( which i basically the only mode that will be able to go through ) and edit /etc/nssswitch.conf.
In this file commented all the lines that take info from Ldap.
i.e:

#protocols: files ldap


This will allow the system to start the services without the slapd support-

Now, i have heards a lot of folks around blaming Samba for the same problem.

I suggest them to look in the same file and they will probably notice that Ldap is set to give samba auth too.

So that started the server which after i edited the nsswitch again and tried to restart ldap, responded with the following:


[root@tserver]# /etc/rc.d/init.d/ldap start
Checking configuration files for slapd: bdb_db_open: unclean shutdown detected; attempting recovery.
bdb_db_open: Recovery skipped in read-only mode. Run manual recovery if errors are encountered.
bdb_db_open: Database cannot be opened, err 13. Restore from backup!
bdb(dc=domain,dc=com): DB_ENV->lock_id_free interface requires an environment configured for the locking subsystem
backend_startup_one: bi_db_open failed! (13)
slap_startup failed (test would succeed using the -u switch)
[FAILED]
stale lock files may be present in /var/lib/ldap [WARNING]


so the server won't start at all.

Now, Ldap is a wonderful piece of software but this is a very very annoying problem to have.

This is what will you:

[root@theserver]# /usr/sbin/slapd_db_recover -v -h /var/lib/ldap
Finding last valid log LSN: file: 1 offset 5324863
Recovery starting from [1][5213551]
Recovery complete at Sat Mar 17 09:13:51 2007
Maximum transaction ID 8000040d Recovery checkpoint [1][5324863]


And at the end of this process you have to chown the files in /var/lib/ldap to
ldap.ldap


Conclusively, if this happens (and knowing Ldap, it will) this is what's to be done:

  1. Enter in runlevel 1, edit /etc/nsswitch.conf and comment out the lines that have ldap and reboot.
  2. Edit the file again.
  3. try service ldap start ( to make sure this is your case)
  4. If the result is similar to mine above do:
/usr/sbin/slapd_db_recover -v -h /var/lib/ldap
chown -R ldap.ldap /var/lib/ldap


1 comment:

Doug said...

Yea, Sid! Thanks for this post, you saved our bacon. (Whatever did we do before google?)

We didn't get any error messages from slapd, it just didn't work. It said it was working (restarted OK, etc), but it wasn't listening on port 389. When I ran it with -d 3 and saw that it was stopping on /var/lib/ldap, I googled that, found your entry, and hit it with slapd_db_recover, and we were back on the air.

Thanks for documenting your experience!

Doug