Extracting reliable data for short-lived processes using eBPF for Linux Security Threat Analysis

Day 1 | 16:20 | 00:20 | K.4.201 | Ankit Garg, Meghna Vasudeva, Lakshmy A V


Note: I'm reworking this at the moment, some things won't work.

The stream isn't available yet! Check back at 16:20.
Get involved in the conversation!Join the chat

Endpoint Detection and Response (EDR) solutions continuously monitor the events occurring on an endpoint, create detailed process timelines, and analyse these data, to reveal suspicious patterns that could indicate threats such as ransomware. To create these detailed timelines, EDR solutions collect a variety of information about each process running on the endpoint, such as timestamp, PID, process name, path, command line, etc. On Linux systems, this is often done using the proc filesystem provided by the operating system, which provides rich information for each process currently running on the system. However, if a process is short-lived and exits quickly, the proc filesystem entries for it get cleared out before the EDR solution can read them, leading to incomplete timelines. To take a simple example, suppose a malicious actor runs a script that downloads a binary from the network and then executes it. This downloaded binary quickly spawns a bunch of long-running malicious processes and exits itself. If EDR solution is unable to extract the complete process information about the execution of the downloaded binary from proc filesystem (being a short-lived process), it'll miss details about the creator of the malicious process in the system. Hence, EDR solution will have visibility gaps about the downloaded binary required for Security Threat Analysis. We propose a solution to address the gaps by attaching extended Berkeley Packet Filter (eBPF) programs dynamically to the Linux kernel system calls (such as fork, exec and exit) and extracting all the required information directly from the kernel hooks using BPF Type Format (BTF). These hooks use fundamental task structure in kernel representing a process or thread to extract variety of information about the process. Our proof of concept shows that eBPF-based process data extraction provides process timelines with near 100% reliability, compared to proc filesystem-based approaches which have a reliability of only around 83% for short lived processes.