Beyond TinyML: Balance inference accuracy and latency on MCUs
UD2.120 (Chavanne) | Day 1 | 11:50 - 12:10 | Speakers: Charalampos Mainas, Anastassios Nanos, Anastasia Mallikopoulou
Abstract
Can an ESP32-based MCU run (tiny)ML models accurately and efficiently? This talk showcases how a tiny microcontroller can transparently leverage neighboring nodes to run inference on full, unquantized torchvision models in less than 100ms! We build on vAccel, an open abstraction layer that allows interoperable hardware acceleration and enable devices like the ESP32 to transparently offload ML inference and signal-processing tasks to nearby edge or cloud nodes. Through a lightweight agent and a unified API, vAccel bridges heterogeneous devices, enabling seamless offload without modifying application logic.
This session presents our IoT port of vAccel (client & lightweight agent) and demonstrates a real deployment where an ESP32 delegates inference to a GPU-backed k8s node, reducing latency by 3 orders of magnitude while preserving Kubernetes-native control and observability. Attendees will see how open acceleration can unify the Cloud–Edge–IoT stack through standard interfaces and reusable runtimes.
Attachments
Speakers
Charalampos Mainas is a systems software engineer who is very interested in virtualization technologies and operating systems. His main focus is on finding ways to improve the performance and scalability of lightweight VMMs. A significant portion of his work has been dedicated on Unikernels, including porting applications, libraries, and language runtimes, with an emphasis on enhancing their compatibility with existing technologies. In that context, he leads the development of bunny and urunc, which allow users to simply docker build and docker run unikernels and similar technologies.
I am a Researcher in Computer Systems and I am currently working on the lower-level parts of the stack to attack issues related to performance, scalability, power-efficiency and security in hypervisors.
Since 2015 I have been affiliated with UK & EU firms, building & architecting solutions for efficient execution of workloads in the Cloud and at the Edge. I have been involved in many parts of the systems software stack, including device drivers, memory management, network/block layers etc.
I’m a software engineer interested in machine learning workloads and system performance across cloud and edge environments, focusing on observability and understanding real-world system behavior.
Links
External Links
Notice: The placeholder video image is licensed under CC BY-SA 4.0. The original image can be found hereChanges made to the image are: Cropped the image to a new ratio, part of the image was cut off.
