The Dawn of Natural Language to SQL: Are We Fully Ready?

Boyan Li; Yuyu Luo; Chengliang Chai; Guoliang Li; Nan Tang

arXiv:2406.01265·cs.DB·July 30, 2024

The Dawn of Natural Language to SQL: Are We Fully Ready?

Boyan Li, Yuyu Luo, Chengliang Chai, Guoliang Li, Nan Tang

PDF

Open Access 3 Repos

TL;DR

This paper introduces NL2SQL360, a comprehensive evaluation framework for natural language to SQL translation, compares leading methods, and identifies SuperSQL as a top performer with high execution accuracy.

Contribution

The paper presents NL2SQL360, a new evaluation framework for NL2SQL models, and demonstrates how it can be used to identify the most effective NL2SQL method for specific scenarios.

Findings

01

SuperSQL achieves 87% execution accuracy on Spider dataset.

02

NL2SQL360 facilitates detailed comparison across different methods and scenarios.

03

SuperSQL outperforms other models in the evaluated benchmarks.

Abstract

Translating users' natural language questions into SQL queries (i.e., NL2SQL) significantly lowers the barriers to accessing relational databases. The emergence of Large Language Models has introduced a novel paradigm in NL2SQL tasks, enhancing capabilities dramatically. However, this raises a critical question: Are we fully prepared to deploy NL2SQL models in production? To address the posed questions, we present a multi-angle NL2SQL evaluation framework, NL2SQL360, to facilitate the design and test of new NL2SQL methods for researchers. Through NL2SQL360, we conduct a detailed comparison of leading NL2SQL methods across a range of application scenarios, such as different data domains and SQL characteristics, offering valuable insights for selecting the most appropriate NL2SQL methods for specific needs. Moreover, we explore the NL2SQL design space, leveraging NL2SQL360 to automate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScientific Computing and Data Management