ScienceBenchmark: A Complex Real-World Benchmark for Evaluating Natural Language to SQL Systems
Yi Zhang, Jan Deriu, George Katsogiannis-Meimarakis, Catherine Kosten,, Georgia Koutrika, Kurt Stockinger

TL;DR
ScienceBenchmark introduces a challenging, real-world NL-to-SQL benchmark based on complex scientific databases, highlighting the limitations of current systems trained on simpler datasets and emphasizing the need for domain-specific data augmentation.
Contribution
The paper presents ScienceBenchmark, a new complex NL-to-SQL benchmark with domain-specific databases and high-quality NL/SQL pairs, created with expert input and synthetic data augmentation.
Findings
Current top NL-to-SQL systems perform poorly on ScienceBenchmark.
ScienceBenchmark is more challenging than existing benchmarks like Spider.
Synthetic data helps improve training for complex domains.
Abstract
Natural Language to SQL systems (NL-to-SQL) have recently shown a significant increase in accuracy for natural language to SQL query translation. This improvement is due to the emergence of transformer-based language models, and the popularity of the Spider benchmark - the de-facto standard for evaluating NL-to-SQL systems. The top NL-to-SQL systems reach accuracies of up to 85\%. However, Spider mainly contains simple databases with few tables, columns, and entries, which does not reflect a realistic setting. Moreover, complex real-world databases with domain-specific content have little to no training data available in the form of NL/SQL-pairs leading to poor performance of existing NL-to-SQL systems. In this paper, we introduce ScienceBenchmark, a new complex NL-to-SQL benchmark for three real-world, highly domain-specific databases. For this new benchmark, SQL experts and domain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Scientific Computing and Data Management · Natural Language Processing Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Multi-Head Attention · Attention Is All You Need · Dropout · Linear Layer · Attention Dropout · Cosine Annealing · {Dispute@FaQ-s}How to file a dispute with Expedia? · Residual Connection
