UNJOIN: Enhancing Multi-Table Text-to-SQL Generation via Schema Simplification
Poojah Ganesan, Rajat Aayush Jha, Dan Roth, Vivek Gupta

TL;DR
UNJOIN introduces a two-stage schema simplification and retrieval approach for multi-table Text-to-SQL tasks, significantly improving accuracy and generalization without requiring data access or fine-tuning.
Contribution
It proposes a novel schema simplification method that decouples schema retrieval from SQL generation, enhancing multi-table query performance.
Findings
Achieves state-of-the-art results on SPIDER and BIRD datasets.
Does not require data access or fine-tuning, ensuring scalability.
Effectively handles complex schema and relational operations.
Abstract
Recent advances in large language models (LLMs) have greatly improved Text-to-SQL performance for single-table queries. But, it remains challenging in multi-table databases due to complex schema and relational operations. Existing methods often struggle with retrieving the right tables and columns, generating accurate JOINs and UNIONs, and generalizing across diverse schemas. To address these issues, we introduce UNJOIN, a two-stage framework that decouples the retrieval of schema elements from SQL logic generation. In the first stage, we merge the column names of all tables in the database into a single-table representation by prefixing each column with its table name. This allows the model to focus purely on accurate retrieval without being distracted by the need to write complex SQL logic. In the second stage, the SQL query is generated on this simplified schema and mapped back to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsFocus
