Skip to main content

GPU Virtualization with MIG: Multi-Tenant Isolation for AI Inference Workloads

H.2213 | Day 1 | 18:00 - 18:30 | Speakers: YASH PANCHAL

GPU Virtualization with MIG: Multi-Tenant Isolation for AI Inference Workloads
A picture of a devroom at FOSDEM 2024
Open in browser
Get involved in the conversation!Join the chat

Notes

Abstract

Serving AI models on a single GPU for multi tenant workload sounds challenging till you partition a GPU correctly.

This talk is a deep technical exploration of running AI inference workloads on modern GPUs across using Multi-Instance GPU (MIG) isolation.

We'll explore:

  1. The multi-tenant problem: MIG vs other GPU slicing methods.
  2. MIG Fundamentals: Key concepts, working and support.
  3. Managing MIG instances: creation, configuration, monitoring and deletion.
  4. Identifying right approaches based on your workload.
  5. Common issues and failures

Whether you're building a multi-tenant inference platform, optimizing GPU utilization for your team, or exploring how to serve AI models cost-effectively, this talk provides practical configurations for your AI workloads.

Attachments


Notice: The placeholder video image is licensed under CC BY-SA 4.0. The original image can be found hereChanges made to the image are: Cropped the image to a new ratio, part of the image was cut off.