A long, short history of realtime AI agents
K.3.601 | Day 1 | 17:10 - 17:25 | Speakers: Rob Pickering
Abstract
Until a few months ago, the only working approach for connecting realtime AI agents to WebRTC streams and phone calls was to use lengthy pipelines of speech to text, agent orchestration, and text to speech, often using multiple machine learning models from commercial vendors. That has changed with new realtime speech to speech models, most famously the (closed) OpenAI advanced voice, but what are the open source ways to build these kind of systems? This talk walks through my experience with using 4 different projects to build functional systems which can use open source (open weights) models at their core. We will talk about how we have integrated Jambonz, Livekit, and Ultravox (Fixie.AI) within our Aplisay framework and what this allows us to do.
Attachments
Speakers
Links
- Live interactive talk facilitated by WebRTC AI
- Video recording (AV1/WebM) - 31.3 MB
- Video recording (MP4) - 242.9 MB
- Video recording subtitle file (VTT)
- The talk recording didn't work out to great, so here is a bit more info about the session, including a link to the self paced presentation
- Chat room(web)
- Chat room(app)
- Submit Feedback
External Links
Notice: The placeholder video image is licensed under CC BY-SA 4.0. The original image can be found hereChanges made to the image are: Cropped the image to a new ratio, part of the image was cut off.
