XTable in Action: Seamless Interoperability in Data Lakes
Ashvin Agrawal, Tim Brown, Anoop Johnson, Jes\'us Camacho-Rodr\'iguez,, Kyle Weller, Carlo Curino, Raghu Ramakrishnan

TL;DR
XTable provides seamless interoperability between open standard table formats in data lakes, enabling flexible data access and sharing across formats with minimal overhead, thus addressing format selection and compatibility challenges.
Contribution
We introduce XTable, an omni-directional translator that allows writing data in one format and reading it in any other, enhancing interoperability in data lake architectures.
Findings
XTable enables format interoperability with negligible overhead.
Application scenarios demonstrate practical benefits in real-world use cases.
XTable simplifies data management by reducing format dependency.
Abstract
Contemporary approaches to data management are increasingly relying on unified analytics and AI platforms to foster collaboration, interoperability, seamless access to reliable data, and high performance. Data Lakes featuring open standard table formats such as Delta Lake, Apache Hudi, and Apache Iceberg are central components of these data architectures. Choosing the right format for managing a table is crucial for achieving the objectives mentioned above. The challenge lies in selecting the best format, a task that is onerous and can yield temporary results, as the ideal choice may shift over time with data growth, evolving workloads, and the competitive development of table formats and processing engines. Moreover, restricting data access to a single format can hinder data sharing resulting in diminished business value over the long term. The ability to seamlessly interoperate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Scientific Computing and Data Management · Advanced Data Storage Technologies
