Skip to main content

Multi-Petabyte Data Distribution in Industry & Science with CernVM File System

UB4.136 | Day 1 | 12:15 - 12:45 | Speakers: Andriy Utkin, Georgios Christodoulis

Multi-Petabyte Data Distribution in Industry & Science with CernVM File System
A picture of a devroom at FOSDEM 2024
Open in browser

Notes

Abstract

The CernVM File System (CVMFS) is a scalable, high-performance distributed filesystem developed at CERN to efficiently deliver software and static data across global computing infrastructures, primarily designed for high-energy physics (HEP). For the Large Hadron Collider (LHC) only, CVMFS is serving around 4 billion files (~2PB of data). CVMFS uses a content-addressable storage model, where files are stored in the form of cryptographic hashes, ensuring integrity and enabling deduplication. It follows a multi-caching architecture where the data are published in a single source of truth (Stratum 0), mirrored by a network of distributed servers (Stratum 1), and propagated to the clients via forward proxies. This multi-layer of caching allows for a cost-effective alternative to traditional file systems, where clients are offered reliable access to versioned read-only datasets with low overhead. In this talk we will focus on how CVMFS interoperates with the highly adopted S3 storage, providing a conventional POSIX filesystem view of the objects, using the available metadata for efficient exploitation of the medium. We will also highlight the benefit of using CVMFS with containerized workflows and demonstrate tools developed to facilitate data publishing.

Homepage: https://cernvm.web.cern.ch/fs/
Documentation: https://cvmfs.readthedocs.io/
Development: https://github.com/cvmfs/cvmfs/
Forum: https://cernvm-forum.cern.ch/


Notice: The placeholder video image is licensed under CC BY-SA 4.0. The original image can be found hereChanges made to the image are: Cropped the image to a new ratio, part of the image was cut off.