Name: GPUStack: Building a Simple and Scalable Management Experience for Diverse AI Models
Start: 2025-02-01T12:20:00
End: 2025-02-01T12:20:00
Location: UB2.252A (Lameere)

Abstract

Outstanding tools like llama.cpp, Ollama, and LM Studio have made life significantly easier for developers. Running large language models (LLMs) on laptops has become remarkably convenient. However, inference engines and their wrappers don’t address the following challenges: 1. Scaling your solution as your team grows. 2. Supporting models beyond LLMs, such as using diffusion models for role-playing applications, TTS models for NotebookLM equivalents, rerankers and embeddings for retrieval-augmented generation (RAG), and more.

Today, both models and inference engines are highly diverse and rapidly evolving, while GPU resources remain fragmented and heterogeneous. In this talk, we will share our experience building GPUStack — a platform designed to help developers abstract away these complexities and focus solely on building APIs for AI applications.

Links

External Links

Notice: The placeholder video image is licensed under CC BY-SA 4.0. The original image can be found hereChanges made to the image are: Cropped the image to a new ratio, part of the image was cut off.

GPUStack: Building a Simple and Scalable Management Experience for Diverse AI Models

Notes

Abstract

Speakers

Links

External Links