One GPU, Many Models: What Works and What Segfaults
UD2.120 (Chavanne) | Day 1 | 13:55 - 14:15 | Speakers: YASH PANCHAL
One GPU, Many Models: What Works and What Segfaults
Abstract
Serving multiple models on a single GPU sounds great until something segfaults.
Two approaches dominate for parallel inference: MIG (hardware partitioning) and MPS (software sharing). Both promise efficient GPU sharing.
I tested both strategies for video generation workloads in parallel.
This talk digs into what actually happened: where things worked, where memory isolation fell apart, which configs crashed, and what survives under load.
By the end, you'll know:
- How to utilize unused GPU capacity.
- How to setup MIG and MPS.
- Memory issues, crashes, and failures.
- Workload specific configs
Attachments
Links
External Links
Notice: The placeholder video image is licensed under CC BY-SA 4.0. The original image can be found hereChanges made to the image are: Cropped the image to a new ratio, part of the image was cut off.
