StatBot.Swiss: Bilingual Open Data Exploration in Natural Language
Farhad Nooralahzadeh, Yi Zhang, Ellery Smith, Sabine Maennel, Cyril, Matthey-Doret, Rapha\"el de Fondville, Kurt Stockinger

TL;DR
This paper introduces StatBot.Swiss, a bilingual dataset for Text-to-SQL tasks in English and German, and evaluates how current LLMs perform on this new benchmark, revealing their limitations in multilingual SQL generation.
Contribution
The creation of the first bilingual real-world Text-to-SQL dataset, StatBot.Swiss, and an analysis of LLMs' performance on this challenging multilingual benchmark.
Findings
LLMs struggle to generalize in bilingual SQL generation.
Current models perform poorly on the new dataset.
Bilingual datasets reveal limitations of existing LLMs.
Abstract
The potential for improvements brought by Large Language Models (LLMs) in Text-to-SQL systems is mostly assessed on monolingual English datasets. However, LLMs' performance for other languages remains vastly unexplored. In this work, we release the StatBot.Swiss dataset, the first bilingual benchmark for evaluating Text-to-SQL systems based on real-world applications. The StatBot.Swiss dataset contains 455 natural language/SQL-pairs over 35 big databases with varying level of complexity for both English and German. We evaluate the performance of state-of-the-art LLMs such as GPT-3.5-Turbo and mixtral-8x7b-instruct for the Text-to-SQL translation task using an in-context learning approach. Our experimental analysis illustrates that current LLMs struggle to generalize well in generating SQL queries on our novel bilingual dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Linear Layer · Adam · Cosine Annealing · Residual Connection · Multi-Head Attention · Dropout · Dense Connections
