Apache XTable - Interoperability Across Apache Hudi, Apache Iceberg, and Delta Lake
Day 1 | 11:50 | 00:30 | UB5.132 | Dipankar Mazumdar
Note: I'm reworking this at the moment, some things won't work.
With rise in organizations adopting the open data lakehouse architecture, the ecosystem continues to evolve rapidly. Even as industry vendors attempt to standardize around a replacement for good-old Hive, users continue to choose specific table formats such as Apache Hudi, Iceberg, or Delta Lake, for their strengths around specific workloads. New open table formats and columnar file formats optimized for streaming or unstructured data all continue to emerge and recently a crop of new open-source data catalogs have also been gaining mindshare.
These developments underscore the need an interoperability standard for the open data lakehouse, to ensure all the pieces fit with each other without any lockin to any one format, catalog or compute engine. Apache XTable, an open-source project currently incubating under the Apache Software Foundation, addresses this challenge by enabling interoperability across table formats without the need to rewrite any data files, ensuring that each format’s benefits remain accessible within a unified lakehouse setup.
This talk will go over XTable's architecture and its capabilities, designed to support omnidirectional translation between open table formats. By utilizing shared metadata information, XTable enables continuous metadata translation across these formats, allowing data to be accessed by various compute engines and workflows without duplication or extensive rewrites. Unlike one-off conversions, XTable’s continuous metadata updates enable seamless translation of new commits incrementally, which can be beneficial at scale. Moving up a layer, we will then show how to synchronize XTable across multiple open-source and proprietary data catalogs to manage permissions and access control policies. This finally makes it easy for users to bring the best-of-breed compute engines to their data easily.
Finally, we’ll present lessons learned and updates from the community on various deployment options for XTable for example using Managed Workflows for Apache Airflow (MWAA) and AWS Lambda, or how Microsoft Fabric and Snowflake are pursuing interoperability with XTable.
project: https://github.com/apache/incubator-xtable