A Survey of Large Language Model-Based Generative AI for Text-to-SQL: Benchmarks, Applications, Use Cases, and Challenges
Aditi Singh, Akash Shetty, Abul Ehtesham, Saket Kumar, Tala Talaei, Khoei

TL;DR
This survey reviews the evolution, applications, and challenges of large language model-based text-to-SQL systems, emphasizing datasets, domain applications, and future research directions for improving scalability and versatility.
Contribution
It provides a comprehensive overview of LLM-based text-to-SQL systems, highlighting recent advancements, identifying key challenges, and proposing future research directions.
Findings
LLMs have significantly advanced text-to-SQL performance.
Datasets like Spider, WikiSQL, and CoSQL are crucial for progress.
Challenges include domain generalization and multi-turn interaction support.
Abstract
Text-to-SQL systems facilitate smooth interaction with databases by translating natural language queries into Structured Query Language (SQL), bridging the gap between non-technical users and complex database management systems. This survey provides a comprehensive overview of the evolution of AI-driven text-to-SQL systems, highlighting their foundational components, advancements in large language model (LLM) architectures, and the critical role of datasets such as Spider, WikiSQL, and CoSQL in driving progress. We examine the applications of text-to-SQL in domains like healthcare, education, and finance, emphasizing their transformative potential for improving data accessibility. Additionally, we analyze persistent challenges, including domain generalization, query optimization, support for multi-turn conversational interactions, and the limited availability of datasets tailored for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Scientific Computing and Data Management · Distributed and Parallel Computing Systems
