Skip to main content

Building Cloud Infrastructure for AI

H.2213 | Day 1 | 14:00 - 14:30 | Speakers: Dave Hughes, Lukas Stockner

Building Cloud Infrastructure for AI
A picture of a devroom at FOSDEM 2024
Open in browser
Get involved in the conversation!Join the chat

Notes

Abstract

"GPU clouds" for AI application are the hot topic at the moment, but often these either end up being just big traditional HPC-style cluster deployments instead of actual cloud infrastructure or are built in secrecy by hyperscalers.

In this talk, we'll explore what makes a "GPU cloud" an actual cloud, how requirements differ from traditional cloud infrastructure, and most importantly, how you can build your own using open source technology - all the way from hardware selection (do you really need to buy the six-figures boxes?) over firmware (OpenBMC), networking (SONiC, VPP), storage (Ceph, SPDK), orchestration (K8s, but not the way you think), OS deployment (mkosi, UEFI HTTP netboot), virtualization (QEMU, vhost-user), performance tuning (NUMA, RDMA) to various managed services (load balancing, API gateways, Slurm etc.)

Attachments


Notice: The placeholder video image is licensed under CC BY-SA 4.0. The original image can be found hereChanges made to the image are: Cropped the image to a new ratio, part of the image was cut off.