Skip to main content

RamaLama: Making working with AI Models Boring

UB2.252A (Lameere) | Day 2 | 13:00 - 13:20 | Speakers: Eric Curtin

RamaLama: Making working with AI Models Boring
A picture of a devroom at FOSDEM 2024
Open in browser

Notes

Abstract

Managing and deploying AI models can often require extensive system configuration and complex software dependencies. RamaLama, a new open-source tool, aims to make working with AI models straightforward by leveraging container technology, making the process "boring"—predictable, reliable, and easy to manage. RamaLama integrates with container engines like Podman and Docker to deploy AI models within containers, eliminating the need for manual configuration and ensuring optimal setup for both CPU and GPU systems.

This talk will introduce RamaLama’s key features, including support for multiple AI model registries (Ollama, Hugging Face, and OCI), simplified commands for running models as chatbots or REST API services, and compatibility with alternative AI runtimes like llama.cpp and vllm. We’ll explore RamaLama’s unique capabilities, such as generating Podman quadlet files for edge deployments and Kubernetes YAML for scalable deployment, demonstrating how it allows developers to seamlessly transition from local experimentation to production. Join us to learn how RamaLama enables frictionless, containerized AI model deployment for developers and system administrators alike.

Speakers

Eric Curtin

Notice: The placeholder video image is licensed under CC BY-SA 4.0. The original image can be found hereChanges made to the image are: Cropped the image to a new ratio, part of the image was cut off.