Archer: A Human-Labeled Text-to-SQL Dataset with Arithmetic, Commonsense and Hypothetical Reasoning
Danna Zheng, Mirella Lapata, Jeff Z. Pan

TL;DR
Archer is a bilingual, complex reasoning-focused text-to-SQL dataset that challenges current models with its high complexity and diverse reasoning types, highlighting the need for advanced approaches.
Contribution
The paper introduces Archer, a novel bilingual dataset with complex reasoning tasks for text-to-SQL, surpassing existing datasets in difficulty and scope.
Findings
Current state-of-the-art models perform poorly on Archer.
Archer covers 20 domains and includes arithmetic, commonsense, and hypothetical reasoning.
High complexity of Archer demonstrates the need for improved models.
Abstract
We present Archer, a challenging bilingual text-to-SQL dataset specific to complex reasoning, including arithmetic, commonsense and hypothetical reasoning. It contains 1,042 English questions and 1,042 Chinese questions, along with 521 unique SQL queries, covering 20 English databases across 20 domains. Notably, this dataset demonstrates a significantly higher level of complexity compared to existing publicly available datasets. Our evaluation shows that Archer challenges the capabilities of current state-of-the-art models, with a high-ranked model on the Spider leaderboard achieving only 6.73% execution accuracy on Archer test set. Thus, Archer presents a significant challenge for future research in this field.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Mathematics, Computing, and Information Processing
