One GPU, Many Models: What Works and What Segfaults

Name: One GPU, Many Models: What Works and What Segfaults
Start: 2026-01-31T13:55:00
End: 2026-01-31T13:55:00
Location: UD2.120 (Chavanne)

UD2.120 (Chavanne) | Day 1 | 13:55 - 14:15 | Speakers: YASH PANCHAL

Copy link

Copy link

Open in browser

Notes

Abstract

Serving multiple models on a single GPU sounds great until something segfaults.

Two approaches dominate for parallel inference: MIG (hardware partitioning) and MPS (software sharing). Both promise efficient GPU sharing.

I tested both strategies for video generation workloads in parallel.

This talk digs into what actually happened: where things worked, where memory isolation fell apart, which configs crashed, and what survives under load.

By the end, you'll know:

How to utilize unused GPU capacity.
How to setup MIG and MPS.
Memory issues, crashes, and failures.
Workload specific configs

Attachments

Slides

Speakers

YASH PANCHAL

Yash is an SDET III at Percona, with a background in System administration, DevOps and QA.

In his spare time, he breaks GPUs by stress-testing memory and compute limits for efficiently running new AI models.

External Links

Notice: The placeholder video image is licensed under CC BY-SA 4.0. The original image can be found hereChanges made to the image are: Cropped the image to a new ratio, part of the image was cut off.