Skip to main content

Supply chain security meets AI: Detecting AI-generated code

UB5.132 | Day 1 | 17:30 - 17:55 | Speakers: Philippe Ombredanne

Supply chain security meets AI: Detecting AI-generated code
A picture of a devroom at FOSDEM 2024
Open in browser

Notes

Abstract

Everyone's excited (sarcasm) that AI coding tools make developers more productive. Security teams are excited too - they've never had this much job security.

LLMs and AI-assisted coding tools are writing billions of lines of code, so teams can ship 10x faster. They're also inheriting vulnerabilities 10x faster.

We need to detect AI-generated code and trace it back to its FOSS origins. The challenge: exact matching doesn't work for AI-generated code since each generation may have small variations given the same input prompt.

AI-Generated Code Search (https://github.com/aboutcode-org/ai-gen-code-search) introduces a new approach using locality-sensitive hashing and content-defined chunking for approximate matching that actually works with AI output variations. This FOSS project delivers reusable open source libraries, public APIs, and open datasets that make AI code detection accessible to everyone, not just enterprises with massive budgets.

In this talk, we'll explain how we fingerprint code fragments for fuzzy matching, build efficient indexes that don't balloon to terabytes, and trace AI-generated snippets back to their training data sources. We'll demo real examples of inherited vulnerabilities, show how it integrates with existing FOSS tools for SBOM and supply chain analysis, and explain how this directly supports CRA compliance for tracking code origin.

Bottom line: if AI-generated code is in your dependencies (and it probably is), you need visibility into what it's derived from and what risks it carries. This project gives you the FOSS tools and data to find out.


Notice: The placeholder video image is licensed under CC BY-SA 4.0. The original image can be found hereChanges made to the image are: Cropped the image to a new ratio, part of the image was cut off.