Multi-Petabyte Data Distribution in Industry & Science with CernVM File System
UB4.136 | Day 1 | 12:15 - 12:45 | Speakers: Andriy Utkin, Georgios Christodoulis
Abstract
The CernVM File System (CVMFS) is a scalable, high-performance distributed filesystem developed at CERN to efficiently deliver software and static data across global computing infrastructures, primarily designed for high-energy physics (HEP). For the Large Hadron Collider (LHC) only, CVMFS is serving around 4 billion files (~2PB of data). CVMFS uses a content-addressable storage model, where files are stored in the form of cryptographic hashes, ensuring integrity and enabling deduplication. It follows a multi-caching architecture where the data are published in a single source of truth (Stratum 0), mirrored by a network of distributed servers (Stratum 1), and propagated to the clients via forward proxies. This multi-layer of caching allows for a cost-effective alternative to traditional file systems, where clients are offered reliable access to versioned read-only datasets with low overhead. In this talk we will focus on how CVMFS interoperates with the highly adopted S3 storage, providing a conventional POSIX filesystem view of the objects, using the available metadata for efficient exploitation of the medium. We will also highlight the benefit of using CVMFS with containerized workflows and demonstrate tools developed to facilitate data publishing.
Homepage: https://cernvm.web.cern.ch/fs/
Documentation: https://cvmfs.readthedocs.io/
Development: https://github.com/cvmfs/cvmfs/
Forum: https://cernvm-forum.cern.ch/
Speakers
Linux software engineer since 2007. FOSS contributor since 2011: FFmpeg, GStreamer, kernel, Gentoo, Bluecherry DVR.
Own products which got public attention include:
Georgios (George) is a computer engineer working as a fellow at CERN. In the past he has worked on heterogeneous computing in the scope of HPC and in the development of OSI layer 3 network systems. Since 2025 he is part of the core development team of the CernVM-FileSystem (CVMFS) that is used to distribute software for users in science and industry.
Links
External Links
Notice: The placeholder video image is licensed under CC BY-SA 4.0. The original image can be found hereChanges made to the image are: Cropped the image to a new ratio, part of the image was cut off.
