You are viewing the 2025 edition of FOSDEM. Click here to view the 2026 edition
Self-hosted LLMs at a scale with Paddler
UB2.252A (Lameere) | Day 2 | 12:40 - 13:00 | Speakers: Mateusz Charytoniuk
Self-hosted LLMs at a scale with Paddler
Abstract
Paddler is an open-source llama.cpp load balancer designed to address unique challenges that Large Language Models pose.
Typical balancing algorithms like round-robin or least-connections are not the most efficient approaches.
To introduce predictability into your infrastructure, Paddler reaches for alternative solutions that account for unpredictable response times while being able to scale services up and down at any moment.
This talk will demonstrate Paddler's general design concepts (the "why") and some primary use cases (the "how").
Attachments
Speakers
Mateusz Charytoniuk
Links
External Links
Notice: The placeholder video image is licensed under CC BY-SA 4.0. The original image can be found hereChanges made to the image are: Cropped the image to a new ratio, part of the image was cut off.
