Are LLMs Overkill for Databases?: A Study on the Finiteness of SQL
Yue Li, David Mimno, Unso Eun Seo Jo

TL;DR
This study shows that practical SQL queries generated from natural language are finite and predictable, with most queries fitting into a small set of templates, suggesting LLMs may be overkill for database access.
Contribution
It demonstrates that SQL query complexity is bounded and follows a predictable distribution, challenging the need for large LLMs in database query generation.
Findings
SQL queries are finite in practical complexity.
Most queries can be covered with a small set of templates.
SQL query distribution follows a Power Law-like pattern.
Abstract
Translating natural language to SQL for data retrieval has become more accessible thanks to code generation LLMs. But how hard is it to generate SQL code? While databases can become unbounded in complexity, the complexity of queries is bounded by real life utility and human needs. With a sample of 376 databases, we show that SQL queries, as translations of natural language questions are finite in practical complexity. There is no clear monotonic relationship between increases in database table count and increases in complexity of SQL queries. In their template forms, SQL queries follow a Power Law-like distribution of frequency where 70% of our tested queries can be covered with just 13% of all template types, indicating that the high majority of SQL queries are predictable. This suggests that while LLMs for code generation can be useful, in the domain of database access, they may be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLibrary Science and Information Systems · Advanced Database Systems and Queries · Scientific Computing and Data Management
