Evaluating LLMs for Text-to-SQL Generation With Complex SQL Workload
Limin Ma, Ken Pu, Ying Zhu

TL;DR
This paper compares the complexity of TPC-DS with BIRD and Spider benchmarks, evaluates 11 LLMs for text-to-SQL generation on TPC-DS, and highlights the need for more sophisticated benchmarks and improved models.
Contribution
It introduces structural complexity measures for benchmarks and assesses LLM performance on TPC-DS, revealing current models' limitations in generating accurate complex SQL queries.
Findings
TPC-DS queries are more structurally complex than BIRD and Spider.
State-of-the-art LLMs produce inaccurate SQL queries for complex workloads.
Current models are insufficient for practical, real-world text-to-SQL applications.
Abstract
This study presents a comparative analysis of the a complex SQL benchmark, TPC-DS, with two existing text-to-SQL benchmarks, BIRD and Spider. Our findings reveal that TPC-DS queries exhibit a significantly higher level of structural complexity compared to the other two benchmarks. This underscores the need for more intricate benchmarks to simulate realistic scenarios effectively. To facilitate this comparison, we devised several measures of structural complexity and applied them across all three benchmarks. The results of this study can guide future research in the development of more sophisticated text-to-SQL benchmarks. We utilized 11 distinct Language Models (LLMs) to generate SQL queries based on the query descriptions provided by the TPC-DS benchmark. The prompt engineering process incorporated both the query description as outlined in the TPC-DS specification and the database…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Advanced Database Systems and Queries · Service-Oriented Architecture and Web Services
