Automated Migration of Hierarchical Data to Relational Tables using Programming-by-Example
Navid Yaghmazadeh, Xinyu Wang, Isil Dillig

TL;DR
This paper introduces Mitra, a programming-by-example tool that automatically converts hierarchical XML and JSON data into relational tables, demonstrating high success rates and efficiency in real-world data transformation tasks.
Contribution
The paper presents a novel programming-by-example approach and implementation in Mitra for automating hierarchical to relational data migration, achieving high accuracy and speed.
Findings
Mitra automates 94% of data transformation tasks in experiments.
Mitra successfully converts real-world XML and JSON datasets to relational databases.
Average synthesis time for programs is 3.8 seconds.
Abstract
While many applications export data in hierarchical formats like XML and JSON, it is often necessary to convert such hierarchical documents to a relational representation. This paper presents a novel programming-by-example approach, and its implementation in a tool called Mitra, for automatically migrating tree-structured documents to relational tables. We have evaluated the proposed technique using two sets of experiments. In the first experiment, we used Mitra to automate 98 data transformation tasks collected from StackOverflow. Our method can generate the desired program for 94% of these benchmarks with an average synthesis time of 3.8 seconds. In the second experiment, we used Mitra to generate programs that can convert real-world XML and JSON datasets to full-fledged relational databases. Our evaluation shows that Mitra can automate the desired transformation for all datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Semantic Web and Ontologies · Data Quality and Management
