CORE-T: COherent REtrieval of Tables for Text-to-SQL
Hassan Soliman, Vivek Gupta, Dan Roth, Iryna Gurevych

TL;DR
CORE-T is a scalable, training-free framework that enhances table retrieval for text-to-SQL tasks by combining dense retrieval with LLM-generated metadata and compatibility checks, significantly improving accuracy and efficiency.
Contribution
It introduces CORE-T, a novel method that enriches tables with purpose metadata and uses a lightweight cache to improve multi-table retrieval without extra training.
Findings
Improves table-selection F1 by up to 22.7 points
Retrieves up to 42% fewer tables
Enhances multi-table execution accuracy by up to 6.9 points
Abstract
Realistic text-to-SQL workflows often require joining multiple tables. As a result, accurately retrieving the relevant set of tables becomes a key bottleneck for end-to-end performance. We study an open-book setting where queries must be answered over large, heterogeneous table collections pooled from many sources, without clean scoping signals such as database identifiers. Here, dense retrieval (DR) achieves high recall but returns many distractors, while join-aware alternatives often rely on extra assumptions and/or incur high inference overhead. We propose CORE-T, a scalable, training-free framework that enriches tables with LLM-generated purpose metadata and pre-computes a lightweight table-compatibility cache. At inference time, DR returns top-K candidates; a single LLM call selects a coherent, joinable subset, and a simple additive adjustment step restores strongly compatible…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Web Data Mining and Analysis · Advanced Database Systems and Queries
