Auditd still crashing RHEL3/Centos3 systems
This is a well known issue, and it puzzles me that so many peoples don’t know about it. Still whenever I hear of peoples having random crashes with their systems, and they are running RHEL3, the first thing to check is if auditd is still enabled. Disabling auditd is the first things that I would recommend doing, and only after that if the problem still persists to look further into it. After recently doing this on several servers (you would think that most peoples took care of this by now, but it is not so…), I decided to post this in a separate blog entry so I can refer it, as a small step by step instructions anyone can do.
Why is this happening in the first place? Well RedHat had the bad idea to enable by default auditd on RHEL3. After seeing how many peoples had problems with it, I should conclude that this was one of the worse idea they could have… Anyway, the problem is seen either by random, inexplicable system crashes, or by filling up the hdd with huge audit.d logs. This post will show how you can disable auditd and should be useful to any RHEL3/Centos3/whatever other rhel clones users that have similar problems. The RHEL3 and CPanel combination is the favorite deployment for many US datacenters and this is also very common. Of course this could be useful for other RHEL versions or even other Linux distributions if needed.
1. Check if auditd is indeed running.
If auditd has not been explicitly stopped on a rhel3 system you will have it running. You can check it out by running:
service audit status
and if you get something like:
auditd (pid xxx) is running...
then you have it running.
Also you can see in the list of running processes if auditd appears:
ps -ef | grep auditd
2. Stop the auditd service
You can stop the auditd daemon by running:
chkconfig audit off
service audit stop
The first command should prevent the service from starting again on system reboot and the last one stops the active service.
3. Removing the audit module
You would think that this was it… and what was the reason for this post? to show how you can stop a service? Nope… there is more. Actually you might see problems on systems even with the auditd daemon not running but with the kernel module still loaded. In this case where we are now, we still have the audit kernel module running and doing ‘its work’ ;-). You can check this with:
lsmod | grep audit
trying to remove the kernel module:
rmmod audit
will give a busy error (as the module is still in use)… Who is using still using it? Well on default rhel3 systems cron and atd are still using it. So in order to remove it we need first to stop those services:
service crond stop
service atd stop
rmmod audit
Now the module should no longer be loaded and you can check that out with:
lsmod | grep audit
4. Prevent the audit module from loading again
If we don’t do this step when we start back the cron service we will load back the audit kernel module.
echo "alias char-major-10-224 off" >> /etc/modules.conf
This will prevent the module to load again. And now we can restore the services that were running back:
service crond start
service atd start
(start only the ones you had running - most certainly cron, but atd might not be needed on all systems).
5. Remove existing audit.d logs
You might want to remove the audit logs as this can be a huge amount of useless information:
rm -Rfv /var/log/audit.d/
rm -fv /var/log/audit
List of commands used:
service audit status
chkconfig audit off
service audit stop
service crond stop
service atd stop
rmmod audit
lsmod | grep audit
echo "alias char-major-10-224 off" >> /etc/modules.conf
service crond start
service atd start
rm -Rfv /var/log/audit.d/
rm -fv /var/log/audit