CORE-T: COherent REtrieval of Tables for Text-to-SQL

Hassan Soliman; Vivek Gupta; Dan Roth; Iryna Gurevych

arXiv:2601.13111·cs.CL·January 21, 2026

CORE-T: COherent REtrieval of Tables for Text-to-SQL

Hassan Soliman, Vivek Gupta, Dan Roth, Iryna Gurevych

PDF

Open Access

TL;DR

CORE-T is a scalable, training-free framework that enhances table retrieval for text-to-SQL tasks by combining dense retrieval with LLM-generated metadata and compatibility checks, significantly improving accuracy and efficiency.

Contribution

It introduces CORE-T, a novel method that enriches tables with purpose metadata and uses a lightweight cache to improve multi-table retrieval without extra training.

Findings

01

Improves table-selection F1 by up to 22.7 points

02

Retrieves up to 42% fewer tables

03

Enhances multi-table execution accuracy by up to 6.9 points

Abstract

Realistic text-to-SQL workflows often require joining multiple tables. As a result, accurately retrieving the relevant set of tables becomes a key bottleneck for end-to-end performance. We study an open-book setting where queries must be answered over large, heterogeneous table collections pooled from many sources, without clean scoping signals such as database identifiers. Here, dense retrieval (DR) achieves high recall but returns many distractors, while join-aware alternatives often rely on extra assumptions and/or incur high inference overhead. We propose CORE-T, a scalable, training-free framework that enriches tables with LLM-generated purpose metadata and pre-computes a lightweight table-compatibility cache. At inference time, DR returns top-K candidates; a single LLM call selects a coherent, joinable subset, and a simple additive adjustment step restores strongly compatible…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Web Data Mining and Analysis · Advanced Database Systems and Queries