In this article, I’ll describe how to hunt for rootkits in linux, Rootkits are extremely advanced pieces of code, not any one can write it, However there’s a lot of Proof of concept code demonstrating rootkit techniques and how to build one from scratch, In the article I will present a technique based on instructions in some system calls, which can be used to detect rootkits.
I’ll explain techniques not only how to detect rootkits but also how to conduct a forensics and hunt down malware, I’ll focus only to cover the basics at first and understand the nature of the behavior. For those interested in learning about kernel rootkits, I have a separate post titled “Writing a Simple Rootkit for Linux” and “The linux kernel modules programming” which I recommend reading before delving into this topic.
Kernel-mode Rootkits
The kernel is responsible for handling a lot of the user’s system’s functionality whether that be browsing local files or using a web browser to browse the Internet. This is done through the implementation of system calls(low-level functions that run in a kernel context.)
Rootkits can either be in user-land or kernel-land, User-land refers to privilege ring 3, while kernel-land refers to privilege ring 0, In simple term “In order to stay invisible backdoors modify kernel structures and code, causing that nobody can trust the kernel. Nobody”
levels:
ring0(the most powerful privilege level)
ring1
ring2
ring3(the least powerful privilege level)
Dynamic Kernel load kernel modules to give admin the ability to load drivers and other code at runtime, and to remove the need to recompile the kernel and reboot. Kernel
rootkits
typically leverage this to run code directly in kernel space.
Usually what happen is that the code is designed to be loaded as a kernel module. This enables the rootkit
to exploit the dynamic kernel module loading feature, which allows it to inject its code directly into the running kernel without the need to recompile the entire kernel or reboot the system, Let’s take a look at Linux Kernel Module (LKM) Rootkit.
Many rootkits modifies syscall table in order to redirect some useful system calls like sys_read(), sys_write(), sys_getdents(), etc…, Let’s say we have rootkit.ko
to be able to grants the rootkit access to the kernel’s functions and data structures, first we must Locate the sys_call_table
,
/* Code snippet from hijack_execve() function */
syscall_table = (void *)kallsyms_lookup_name("sys_call_table");
real_execve = (void *)syscall_table[__NR_execve];
syscall_table[__NR_execve] = &new_execve;
in this function we uses kallsyms_lookup_name()
to locate the address of the sys_call_table
. Next After locating the sys_call_table
, the rootkit
replaces the address of the original execve()
system call with the address of its own new_execve()
function. This hooks the execve()
system call, and whenever any process calls execve()
, the rootkit’s new_execve()
function will be executed instead.
Finally the rootkit
is compiled, it can be loaded into the kernel, Once the rootkit
module is loaded, its code is executed directly in the kernel space with full privileges (ring 0)
.
File System Analysis
So, How to detect kernel based rootkits, The main problem is we can not trust the kernel, However we can get some reliable information form it. meaning we want to use kernel to help us detect “bad” code which has the full control of our kernel. The question is which kernel functions should we check?
Let’s back up a little bit, what is the main task of every rootkit
? Well, its job is to hide presence of it’s author.
process,files,connections
in the system. And those things should be hidden from such tools like ls
, ps
, netstat
etc. These programs collect the system information through some well known system calls.
Even if the malware does not touch syscall
directly it modifies some kernel functions which are activated by one of the system call. The problem lies in the fact, that these modified functions does not have to be executed during every system call. For example if we modify only some pointer to reading functions in procfs
, then the code will be executed only when read()
is called in order to read some specific file, like /proc/net/
This is what make rootkit challenging to detect, one of the old method in fact not old it’s just modern rootkits are more advanced using techniques like hooking
, evasion
, I’ll get in to this later, The heuristic approach basically we have to measure execution time of particular system call with different arguments. For example we test sys_read()
by reading "/etc/passwd"
and "/proc/net/"
(i.e. reading regular file, device and pseudo proc-file).
These measurements are intended to identify anomalies that may indicate the presence of a rootkit
, Instead of testing all 230 system calls the method focuses on a subset of system calls that are commonly used by rootkits to perform specific tasks. For example, they mention testing the sys_read()
call with different file paths
This method is based on the assumption that typical rootkit
tasks, such as hiding processes or files, may involve only a limited set of system calls, and Rootkits
may manipulate or hide information in the procfs
to conceal their activities, The "/proc"
directory provides a pseudo-file system that exposes various kernel data structures and system information.
How this approach work’s?
First we got the subset of sys, next we measure the execution time of each system call with its chosen arguments, In a normal, non-compromised system, the execution time for the same system call with different arguments should not vary significantly. However, if a rootkit
is present and actively manipulating the system, there may be inconsistencies or delays in the execution time of certain system calls.
/* Measure sys_read() execution time */
time1 = measure_sys_read(FILE_1); // etc/passwd
time2 = measure_sys_read(FILE_2); // proc/net/*
we opens the file, reads some data, and then calculates the execution time. After measuring the execution times for the two files, we compare them to detect any significant differences, However, There’s one problem with this method which is false positives.
Linux kernel is a complex program, and most of the system calls have many if-then clauses which means different patch are executed depending on many factors. it can be anything so this approach often requires a good knowledge of the system and experiences to be able to preform more analysis and this goes on all the tools out there like rkhunter
, chkrootkit
While they implement more advanced techniques than simple execution time measurement, they are not immune to false positives or false negatives.
The effectiveness of these tools depends on various factors, including the accuracy of the heuristics used, the timeliness of their signature databases, and the expertise of the user.
Memory Analysis
Next, I will show a case forensic analysis using the Volatility memory analysis framework, After a system has been compromised, it becomes crucial to extract forensically relevant information. RAM, being volatile, clears its memory each time the computer is restarted. meaning, if a hacked computer is restarted, a significant amount of information that could reveal how the system was initially compromised will be lost. To address this issue, Volatility comes into play as a valuable tool capable of analyzing the volatile memory of a system.
Typically, one would commence by extracting system information and gathering data about the operating system (OS) and its base configuration. However, it is important to note that this is not a Volatility tutorial, and the rootkit does not leave any traces for identifying the infection in this memory image. Therefore, we must conduct a more extensive investigation, beginning with scrutinizing process listings and conducting in-depth analysis.
Check which processes were running on the system when you took the memory dump using the linux_psaux plugin, you can find plugins by
$ python3 vol.py --info | grep linux_
This plugin is used to provide a full process listing of the system. Its output is approximately the same as would be obtained running the ps -aux
command via a terminal, The resulting output shown Everything in this list of processes appeared to be normal, with the exception of one process, specifically the very last process in the list. Its name of F00 is not a known standard Linux
Next we use linux_pslist This plugin prints the list of active processes starting from the init_task
symbol and walking the task_struct->tasks
linked list. It does not display the swapper process. If the DTB column is blank, the item is likely a kernel thread. Result the same numbers of processes were found using this plugin as with the previous plugin. Again,
the only process that did not appear to belong was F00, Next is dump the files for further analysis we utilize linux_lsof plugin which mimics the lsof
command on a live system. It prints the list of open file descriptors and their paths for each running process
Pid FD Path
-------- -------- ----
1 0 /dev/null
1 1 /dev/null
1 2 /dev/null
After we list the files and detect what we looking for by tracking down the suspicious process it’s time prints details of process memory, including heaps, stacks, and shared libraries linux_proc_maps This very powerful plugin can be used to learn important information about the underlying system as a whole
0x8050000-0x8051000 r-x 2777 /usr/F_00/F_00
0x8051000-0x8052000 rw- 4096 2777 /usr/F_00/F_00
0xb75d7000-0xb75d8000 rw- 0 0
What this plugin reveals about this process is the actual location of the files associated with
suspicious process F_00 specifically its actual location, /usr/
also it is revealed by its permission of r-x. Interestingly, this process only relies on two libraries, whereas most system processes rely on many additional libraries, Finally it’s time time dump this process directly from the memory image, to do that let’s call linux_find_file This particular plugin can be used to not only dump pre-identified files from the memory image (using information obtained from other plugins) but it can also list all filesystem objects with an open handle in memory we can simply dump any target file with the argument -F
for example:
$ python3 vol.py ... linux_find_file -F “/usr/F_00/F_00”
Inode Number Inode
-------------------- ---------------
0161170 0xf
$ python3 vol.py ... linux_find_file -i 0xf -O mal.elf
The “Inode” represents the location in memory where this specific “inode” is stored. With this information, the file “mal.elf” was generated and dumped from the memory image. The next step is to verify the hash of the file and check if there is a match in any malware database or perform further analysis on the binary.
Of course, In the real world situation it won’t be this easy but hopefully you learned something about memory and the possibility to not only identify it but also dump a suspect file, Additionally, it’s important to note that rootkits may create concealed network connections, a topic we can explore in a separate post focused on Analyzing Network Traffic.
END
Of course, In the real world situation it won’t be this easy but hopefully you learned something about memory and the possibility to not only identify it but also dump a suspect file, Additionally, it’s important to note that rootkits may create concealed network connections, a topic we can explore in a separate post focused on Analyzing Network Traffic.
1 - Linux memory forensics
2 - Volatility Linux Command Reference