How to restore a hacked Linux server
Every sysadmin will try its best to secure the system/s he is managing. Hopefully you never had to restore your own system from a compromise and you will not have to do this in the future. Working on several projects to restore a compromised Linux system for various clients, I have developed a set of rules that others might find useful in similar situations. The type of hacks encountered can be very variate and you might see very different ones than the one I will present, or I have seen live, but even so, this rules might be used as a starting point to develop your own recovery plan.
In most cases if you have a system compromise at root level, you will hear that you have to fully reinstall the system and start fresh because it will be very hard to remove all the hidden files the attacker has placed on the system. This is completely true and if you can afford to do this then you should do it. Still even in this case the compromised system contains valuable information that can be used to understand the attack and prevent it in the future.
Here is a short overview of the steps that I will present:
- Don’t panic. Keep your calm and develop a plan of actions
- Disconnect the system from the network
- Discover the method used to compromise the system
- Stop all the attacker scripts and remove his files
- Restore not affected services
- Fix the problem that caused the compromise
- Restore the affected services
- Monitor the system
Don’t panic. Keep your calm and develop a plan of actions
Ok. You have just found out that you have to restore a hacked system. My first suggestion is to remain calm. Don’t rush and do something you will regret later. Why? Of course you will have to take action as soon as possible, but you can assume that the compromise is probably active for some time and if you act in second 1 or minute 10 this will probably not make much of a difference. If you have experience with such situations and have a proper plan in your mind, go for it, and don’t waste any time. If not, just relax, take 5 minutes to think about what you should do and how to solve this problem. Example of bad actions during this time: you will rush and kill all the running scripts the attacker has launched and then have a timeout when you will think what to do next… In this time the attacker might see you have discovered him (for example from his irc bot, etc.) and might become upset and clean up the system for you… Of course you should not go on with your planned trip and take care of this when you get back, I am just saying to use 5-10 minutes to think on this and develop a short action plan. There should be** no timeout in your actions** and you should always know what is the next step.
Disconnect the system from the network
This might not always be possible. But if you have physical access to the system or even if you are remotely on a system in a datacenter that provides a way to connect from a console (either a regular remote console, or a KVM, or a DRAC card in Dell servers, etc.), then this should be the next step. Connect to the remote console and bring down the network interface. If you don’t have a remote console, here are some other ideas: you might be able to rent a KVM for a limited time from your datacenter, or you might have to write some iptables rules to block any kind of access besides your own IP. After this your system will appear down to everyone, including the attacker that will see that the system is completely down.
Discover the method used to compromise the system
This step in my opinion is the most important one and you should not proceed any further until you have a proper answer to the question: “How did the attacker get in?” This step will be probably the most time consuming as the type of attacks can be quite variate. Still if you don’t find out how the attacker got in, then you will risk that you place the system online and he will be able to compromise the system in a matter of minutes. And this time he might not be so nice and you will not have anything to restore… So even if there is not a general method for this, here are some ideas to get you started:
Depending from what tools you have already configured, you need to identify the files uploaded on the system:
- if you have a system like tripwire configured use it to find out what files where added/changed.
- if you don’t have any such system installed, you might have to use the find command to search for the files newer than x days that were added to the system (also changed files in the respective interval).
Who owns the uploaded files?
- finding out the owner of the files will probably show you what application was used to get in. For example files uploaded as the web user will indicate that the web service was used to get in.
Investigate the uploaded files.
- the files that were uploaded on the system might provide valuable information about the attack. For example the attacker might use the same exploit to attack other systems from your compromised server. This can quickly show you what exploit he is using.
Get as much information from the running scripts launched by the attacker.
- as you have seen I have not recommended to stop yet the running scripts that the attacker might have launched. Why? Because they contain invaluable information to identify the attack. Use lsof on them (lsof -p PID) to see useful details. Where are they located? what user owns them? You might find the source of the attack from this information.
Use Rootkit detection tools.
- you might want to run some rootkit detections tools like rkhunter or chkrootkit to quickly identify common attacks.
Investigate system logs.
- with the information gathered by now you might reduce the size of the log information you have to investigate.
Hopefully you have found the source of the attack by now. Again this is very dependent of the type of attack. The most common one you might see these days is an exploit using a vulnerable web application and an attacker that will launch various scripts (irc bots, scanning tools, attacks on other systems, etc.). Still you might see something different like an attacker not launching any script, loading kernel modules to hide its tracks, to make it more difficult to identify or even see the compromise.
Stop all the attacker scripts and remove his files
Once you have identified the cause of the attack you can safely kill all the running scripts launched by the attacker and remove all his files (save them in a different location for further investigation). At this point we no longer need the scripts running as we got the information we could find from them. The system is still unavailable at this point and no service is available to the world.
Don’t forget to clean-up also the locations that the attacker might have entered to start his scripts on system reboot. Look in init scripts, rc.local, cron tabs for this.
Restore not affected services
Once we know the service used for the attack we can stop it and restore the network connection and all the unaffected services the system might provide. For example if the web server was used to get in, we can stop it, and restore other services like mail, dns, to minimize the downtime of the system.
Fix the problem that caused the compromise
Before starting the service used to compromise the system you should fix the problem so this will not happen again once the service is open to the public… Depending on the problem this might involve: patching a vulnerable daemon, upgrading a vulnerable web application (or temporary disable it), writing some special rulers to block it (for ex. mod_security rules might help in case of no patch available for a web application), etc.
Restore the affected services
Once you have fixed the problem you can restart the service used to get inside the system.
Monitor the system
Now is the time to monitor closely the system and see if the fix you have implemented is working. Most certainly the attacker will try to get in again as he will see he has ’lost’ the system. If you notice any problem stop the service at once and reiterate again to the step to fix the problem (stop the service, fix it, restore it).
Conclusion
These steps are obviously not usable all the time because of the variety of attacks you might encounter, but they can be used as a baseline to develop your own plan of actions. Again in such situations, keep your calm, don’t rush, and work to restore the system based on a clear set of steps. If you had similar experiences please feel free to share your own tips to help others that might find themselves in such a situation.