Skip to main content

Synthetic Data: The Secret Ingredient in Better Language Models

UB2.252A (Lameere) | Day 2 | 14:10 - 14:25 | Speakers: Carol Chen, Cedric Clyburn

Synthetic Data: The Secret Ingredient in Better Language Models
A picture of a devroom at FOSDEM 2024
Open in browser

Notes

Abstract

What’s powering the next generation of AI breakthroughs without relying on massive amounts of human-labeled data? Synthetic data generation has emerged as a transformative approach in enhancing Large Language Models (LLMs). This session demystifies synthetic data, exploring how it’s created, the innovative methodologies behind it, and its transformative impact on AI. Learn how synthetic data bridges knowledge gaps, accelerates training at scale, and enhances performance across tasks like natural language understanding and complex reasoning. Whether you're intrigued by the technical mechanics or its real-world applications, this talk will equip you with actionable insights to practice synthetic data generation and understand how synthetic data is rapidly changing the world of language models.

Attachments

Speakers

Carol Chen
Cedric Clyburn

Notice: The placeholder video image is licensed under CC BY-SA 4.0. The original image can be found hereChanges made to the image are: Cropped the image to a new ratio, part of the image was cut off.