Auto-Tables: Synthesizing Multi-Step Transformations to Relationalize Tables without Using Examples
Peng Li, Yeye He, Cong Yan, Yue Wang, Surajit Chaudhuri

TL;DR
Auto-Tables automatically generates multi-step data transformation pipelines to convert non-relational tables into relational form, simplifying data preparation for analytics without manual programming.
Contribution
We introduce Auto-Tables, a system that synthesizes multi-step transformation pipelines from real-world non-relational tables, addressing a key pain point in data preparation.
Findings
Successfully synthesizes transformations for over 70% of test cases
Operates at interactive speeds without user input
Built a benchmark with 244 real-world test cases
Abstract
Relational tables, where each row corresponds to an entity and each column corresponds to an attribute, have been the standard for tables in relational databases. However, such a standard cannot be taken for granted when dealing with tables "in the wild". Our survey of real spreadsheet-tables and web-tables shows that over 30% of such tables do not conform to the relational standard, for which complex table-restructuring transformations are needed before these tables can be queried easily using SQL-based analytics tools. Unfortunately, the required transformations are non-trivial to program, which has become a substantial pain point for technical and non-technical users alike, as evidenced by large numbers of forum questions in places like StackOverflow and Excel/Power-BI/Tableau forums. We develop an Auto-Tables system that can automatically synthesize pipelines with multi-step…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Advanced Database Systems and Queries · Software Engineering Research
