Skip to main content

Federating Databases with Apache DataFusion: Open Query Planning and Arrow-Native Interoperability

UB2.252A (Lameere) | Day 1 | 17:50 - 18:10 | Speakers: Michiel De Backker, Ghasan Mohammad (hozan23)

Federating Databases with Apache DataFusion: Open Query Planning and Arrow-Native Interoperability
A picture of a devroom at FOSDEM 2024
Open in browser

Notes

Abstract

Apache DataFusion is emerging as a powerful open-source foundation for building interoperable data systems, thanks to its strongly modular design, Arrow-native execution model, and growing ecosystem of extension libraries. In this talk, we'll explore our contributions to the DataFusion ecosystem—most notably DataFusion Federation for cross-database query execution and DataFusion Table Providers that connect DataFusion to a wide range of backends.

We'll show how we use these components to federate queries to databases such as TiDB and InfluxDB 2, and how this fits into a broader data fabric/API generation work we're doing at Twintag. We'll also discuss our work on Arrow-native interfaces, including an Arrow Flight SQL Server implementation for DataFusion and a prototype Flight SQL endpoint for TiDB, which together enable a fully Arrow-based pipeline spanning query submission, execution, and federated dispatch.

The session highlights practical patterns for building distributed data infrastructure using open libraries rather than monolithic systems, and offers a look at where Arrow and DataFusion are headed as shared interoperability layers for modern databases.

Attachments

Speakers

Michiel De Backker

As the CTO at Twintag, I spend my professional time tackling Cloud Native scalability and complex Database challenges. I enjoy architecting systems that can handle heavy loads and critical data.

Off the clock, I’m a FOSS enthusiast obsessed with how browsers connect to the real world. I spend my free time contributing to WebRTC (@Pion) projects and pushing for better Local Peer-to-Peer standards. You can find my Go and Rust code on GitHub at @backkem.

Ghasan Mohammad (hozan23)

I am a software developer and open-source enthusiast with an interest in peer-to-peer networking, distributed systems, and database technologies.


Notice: The placeholder video image is licensed under CC BY-SA 4.0. The original image can be found hereChanges made to the image are: Cropped the image to a new ratio, part of the image was cut off.