Skip to main content

A Unified I/O Monitoring Framework Using eBPF

H.1308 (Rolin) | Day 1 | 14:45 - 15:15 | Speakers: Mahendra Paipuri

A Unified I/O Monitoring Framework Using eBPF
A picture of a devroom at FOSDEM 2024
Open in browser
Get involved in the conversation!Join the chat

Notes

Abstract

The interoperability of I/O monitoring and profiling tools is very limited due to their strong dependence on the underlying file system (LUSTRE, Spectrum Scale, NFS, etc) and resource managers (batch jobs, VMs, containerized workloads, etc). Widely adopted generic monitoring tools often lack the temporal information of the I/O activity which is often required to understand the I/O behavior of the applications. The increasing diversity of applications and computing platforms demands greater flexibility and scope in I/O characterization. This talk proposes a framework for monitoring I/O activity using extended Berkley Packet Filter (eBPF) technology which has gained much traction in observability and cloud-native landscape. By tracing the kernel’s Virtual File System (VFS) functions with eBPF, it is possible to monitor the I/O activity on different types of platforms like HPC, cloud hypervisors or Kubernetes. By storing the metrics traced by eBPF programs in a high performance time series database like Prometheus, it is possible to perform system-wide monitoring of computing platforms that use different types of local or remote file systems in a unified manner. The current talk presents the basics of eBPF and discusses the framework that is used to monitor I/O activity in a file system and application agnostic way. It also presents the experimental results of quantifying the overhead and accuracy of the proposed framework using IOR benchmark results as the reference. The results indicate that there is negligible overhead in using the framework and bandwidths reported by the proposed methodology are in a very good agreement with the ones from IOR tests. Finally, results from a production HPC platform that uses the proposed framework to monitor I/O activity on the LUSTRE file system are presented.

Attachments

Speakers

Mahendra Paipuri

Mahendra has a doctorate in applied mathematics from Universidade de Lisboa, Portugal and M.Sc. in computational sciences from Universitat Politecnica de Catalunya, Barcelona.

After his doctorate, he did his post-doctorate at Universite Gustav Eiffel, working with an ERC project focused on macroscopic modelling of urban transportation networks. Later, he worked for INRIA as a research engineer within SKAO on software-hardware co-design activities for SDP.

Since the beginning of 2022, he has been working for CNRS as a permanent research engineer. He spent more than 3 years at the national HPC center of CNRS as a system/solutions architect. Mahendra joined CDSP in October 2025 to lead the digital projects team.


Notice: The placeholder video image is licensed under CC BY-SA 4.0. The original image can be found hereChanges made to the image are: Cropped the image to a new ratio, part of the image was cut off.