Skip to main content

GPUStack: Building a Simple and Scalable Management Experience for Diverse AI Models

UB2.252A (Lameere) | Day 2 | 12:20 - 12:40 | Speakers: Lawrence Li, Frank Mai

GPUStack: Building a Simple and Scalable Management Experience for Diverse AI Models
A picture of a devroom at FOSDEM 2024
Open in browser

Notes

Abstract

Outstanding tools like llama.cpp, Ollama, and LM Studio have made life significantly easier for developers. Running large language models (LLMs) on laptops has become remarkably convenient. However, inference engines and their wrappers don’t address the following challenges: 1. Scaling your solution as your team grows. 2. Supporting models beyond LLMs, such as using diffusion models for role-playing applications, TTS models for NotebookLM equivalents, rerankers and embeddings for retrieval-augmented generation (RAG), and more.

Today, both models and inference engines are highly diverse and rapidly evolving, while GPU resources remain fragmented and heterogeneous. In this talk, we will share our experience building GPUStack — a platform designed to help developers abstract away these complexities and focus solely on building APIs for AI applications.

Speakers

Lawrence Li
Frank Mai

Notice: The placeholder video image is licensed under CC BY-SA 4.0. The original image can be found hereChanges made to the image are: Cropped the image to a new ratio, part of the image was cut off.