UNJOIN: Enhancing Multi-Table Text-to-SQL Generation via Schema Simplification

Poojah Ganesan; Rajat Aayush Jha; Dan Roth; Vivek Gupta

arXiv:2505.18122·cs.CL·May 26, 2025

UNJOIN: Enhancing Multi-Table Text-to-SQL Generation via Schema Simplification

Poojah Ganesan, Rajat Aayush Jha, Dan Roth, Vivek Gupta

PDF

TL;DR

UNJOIN introduces a two-stage schema simplification and retrieval approach for multi-table Text-to-SQL tasks, significantly improving accuracy and generalization without requiring data access or fine-tuning.

Contribution

It proposes a novel schema simplification method that decouples schema retrieval from SQL generation, enhancing multi-table query performance.

Findings

01

Achieves state-of-the-art results on SPIDER and BIRD datasets.

02

Does not require data access or fine-tuning, ensuring scalability.

03

Effectively handles complex schema and relational operations.

Abstract

Recent advances in large language models (LLMs) have greatly improved Text-to-SQL performance for single-table queries. But, it remains challenging in multi-table databases due to complex schema and relational operations. Existing methods often struggle with retrieving the right tables and columns, generating accurate JOINs and UNIONs, and generalizing across diverse schemas. To address these issues, we introduce UNJOIN, a two-stage framework that decouples the retrieval of schema elements from SQL logic generation. In the first stage, we merge the column names of all tables in the database into a single-table representation by prefixing each column with its table name. This allows the model to focus purely on accurate retrieval without being distracted by the need to write complex SQL logic. In the second stage, the SQL query is generated on this simplified schema and mapped back to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsFocus