A long, short history of realtime AI agents

Until a few months ago, the only working approach for connecting realtime AI agents to WebRTC streams and phone calls was to use lengthy pipelines of speech to text, agent orchestration, and text to speech, often using multiple machine learning models from commercial vendors. That has changed with new realtime speech to speech models, most famously the (closed) OpenAI advanced voice, but what are the open source ways to build these kind of systems? This talk walks through my experience with using 4 different projects to build functional systems which can use open source (open weights) models at their core. We will talk about how we have integrated Jambonz, Livekit, and Ultravox (Fixie.AI) within our Aplisay framework and what this allows us to do.