CRUSH4SQL: Collective Retrieval Using Schema Hallucination For Text2SQL
Mayank Kothyari, Dhruva Dhingra, Sunita Sarawagi, Soumen Chakrabarti

TL;DR
CRUSH4SQL introduces a novel two-stage retrieval method that leverages schema hallucination to efficiently subset large database schemas for Text2SQL tasks, improving recall over existing methods.
Contribution
The paper proposes a schema hallucination-based retrieval approach for large databases, enabling effective schema subsetting without encoding entire schemas, and introduces new benchmarks for evaluation.
Findings
Significantly higher recall than state-of-the-art retrieval methods.
Effective schema subsetting for large databases using hallucination.
New benchmarks for schema subsetting in large databases.
Abstract
Existing Text-to-SQL generators require the entire schema to be encoded with the user text. This is expensive or impractical for large databases with tens of thousands of columns. Standard dense retrieval techniques are inadequate for schema subsetting of a large structured database, where the correct semantics of retrieval demands that we rank sets of schema elements rather than individual elements. In response, we propose a two-stage process for effective coverage during retrieval. First, we instruct an LLM to hallucinate a minimal DB schema deemed adequate to answer the query. We use the hallucinated schema to retrieve a subset of the actual schema, by composing the results from multiple dense retrievals. Remarkably, hallucination generally considered a nuisance turns out to be actually useful as a bridging mechanism. Since no existing benchmarks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Advanced Database Systems and Queries · Data Mining Algorithms and Applications
