RASL: Retrieval Augmented Schema Linking for Massive Database Text-to-SQL
Jeffrey Eben, Aitzaz Ahmad, Stephen Lau

TL;DR
This paper introduces RASL, a retrieval-based approach that decomposes database schemas into semantic units for scalable, accurate text-to-SQL conversion in large enterprise databases without fine-tuning.
Contribution
The paper presents a novel retrieval architecture that leverages schema decomposition and semantic indexing to improve scalability and accuracy in enterprise-level text-to-SQL systems.
Findings
Outperforms baseline methods on large, complex databases.
Maintains high recall and accuracy with extensive metadata.
Enables deployment without domain-specific fine-tuning.
Abstract
Despite advances in large language model (LLM)-based natural language interfaces for databases, scaling to enterprise-level data catalogs remains an under-explored challenge. Prior works addressing this challenge rely on domain-specific fine-tuning - complicating deployment - and fail to leverage important semantic context contained within database metadata. To address these limitations, we introduce a component-based retrieval architecture that decomposes database schemas and metadata into discrete semantic units, each separately indexed for targeted retrieval. Our approach prioritizes effective table identification while leveraging column-level information, ensuring the total number of retrieved tables remains within a manageable context budget. Experiments demonstrate that our method maintains high recall and accuracy, with our system outperforming baselines over massive databases…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Data Quality and Management · Semantic Web and Ontologies
