CRA-ppy data: We need better open data for CRA compliance
UA2.114 (Baudoux) | Day 1 | 18:15 - 18:30 | Speakers: Georg Link, Thomas Steenbergen
Abstract
Everyone's building CRA compliance tooling: SBOM generators, vulnerability scanners, security scorecards, automated due diligence checks. But, CRA readiness isn't just about tooling. It's about ensuring the data feeding those tools is actually accurate and trusted. The project activity, package metadata, licensing information, and vulnerability data these tools depend on is systematically unreliable, and we need to fix it at the source.
This talk demonstrates why data accuracy is the blocking issue for practical CRA readiness. We'll show real-world examples from major package ecosystems: Python packages with wrong license declarations, Java JARs with embedded vulnerable dependencies that scanners miss, Rust crates with incomplete origin metadata. When demonstrating due diligence or attempting automated vulnerability reporting, the underlying data failures make compliance impossible, no matter how good your tools are.
The good news is that this is solvable, and the FOSS community is already working on it!
We'll present concrete approaches being deployed across ecosystems: systematic metadata curation projects that scan and fix package data at scale, validation tooling that catches errors before publication, and community infrastructure that makes accurate software metadata freely available. You'll see how projects like Maven Heaven, T-Rust, and Nixpkgs Clarity are cleaning up metadata for the most popular packages, releasing curated data under open licenses, and providing author-facing tools to prevent bad data from entering registries. And we'll discuss how reliable project health data provides critical insights for proactive CRA due diligence and risk management.
This session gives you practical next steps: how to audit data quality in your dependencies, contribute to metadata curation efforts, integrate validation into your publishing workflow, and leverage community-curated data for more reliable compliance automation.
Speakers
Georg Link is an Open Source Strategist. Georg’s mission is to help open source projects, communities, and companies become more professional in their use of metrics and analytics. Georg cofounded the Linux Foundation CHAOSS Project to advance analytics and metrics for open source project health. Georg has 20 years of experience as an active contributor to several open source projects and has presented on open source topics at 50+ conferences. As the Director of Sales at Bitergia, Georg helps organizations and communities obtain professional metrics and analytics to solve their business needs effectively and efficiently. In his spare time, Georg enjoys reading fiction and scuba diving.
Thomas Steenbergen specializes in strategic open source management, helping organizations align their open source practices with their objectives. An expert in open source adoption, community building, and compliance – including Software Bill of Materials (SBOMs). Currently he is the executive director at the AboutCode foundation and advisor to the Open Source Program Office at SIVON (ICT co-op of Dutch schools). He previously led OSPO at EPAM and HERE. He is a OSS Review Toolkit, SPDX, TODO group maintainer and a regular contributor to FINOS’s Open Source Readiness and OpenChain. Thomas welcomes discussions on open source topics. For more information about the projects he is involved in and his contact details, visit github.com/tsteenbe.
Links
External Links
Notice: The placeholder video image is licensed under CC BY-SA 4.0. The original image can be found hereChanges made to the image are: Cropped the image to a new ratio, part of the image was cut off.
