Scaling Text2SQL via LLM-efficient Schema Filtering with Functional Dependency Graph Rerankers
Thanh Dat Hoang, Thanh Tam Nguyen, Thanh Trung Huynh, Hongzhi Yin, Quoc Viet Hung Nguyen

TL;DR
This paper introduces oolname, an efficient schema filtering framework for Text2SQL that leverages LLMs and graph-based reranking to handle large schemas, improving accuracy and scalability.
Contribution
The paper presents a novel LLM-efficient schema filtering method combining query-aware encoding, graph reranking, and heuristic sub-schema selection for scalable Text2SQL.
Findings
Achieves near-perfect recall and higher precision than existing methods.
Maintains sub-second latency on schemas with over 23,000 columns.
Effectively scales to large real-world schemas.
Abstract
Most modern Text2SQL systems prompt large language models (LLMs) with entire schemas -- mostly column information -- alongside the user's question. While effective on small databases, this approach fails on real-world schemas that exceed LLM context limits, even for commercial models. The recent Spider 2.0 benchmark exemplifies this with hundreds of tables and tens of thousands of columns, where existing systems often break. Current mitigations either rely on costly multi-step prompting pipelines or filter columns by ranking them against user's question independently, ignoring inter-column structure. To scale existing systems, we introduce \toolname, an open-source, LLM-efficient schema filtering framework that compacts Text2SQL prompts by (i) ranking columns with a query-aware LLM encoder enriched with values and metadata, (ii) reranking inter-connected columns via a lightweight graph…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
