LogicCat: A Chain-of-Thought Text-to-SQL Benchmark for Complex Reasoning
Tao Liu, Xutao Mao, Hongying Zan, Dixuan Zhang, Yifan Li, Haixin Liu, Lulu Kong, Jiaming Hou, Rui Li, YunLong Li, aoze zheng, Zhiqiang Zhang, Luo Zhewei, Kunli Zhang, Min Peng

TL;DR
LogicCat introduces a new complex reasoning benchmark for Text-to-SQL tasks, emphasizing domain knowledge, mathematical, and hypothetical reasoning to better reflect real-world data querying challenges.
Contribution
This paper presents the first dataset specifically designed for complex reasoning in Text-to-SQL, including chain-of-thought steps across diverse domains, surpassing existing datasets in complexity.
Findings
State-of-the-art models achieve at most 33.20% accuracy on LogicCat.
LogicCat significantly increases task difficulty for current models.
The dataset covers physics, arithmetic, commonsense, and hypothetical reasoning.
Abstract
Text-to-SQL is a critical task in natural language processing that aims to transform natural language questions into accurate and executable SQL queries. In real-world scenarios, these reasoning tasks are often accompanied by complex mathematical computations, domain knowledge, and hypothetical reasoning scenarios. However, existing large-scale Text-to-SQL datasets typically focus on business logic and task logic, neglecting critical factors such as vertical domain knowledge, complex mathematical reasoning, and hypothetical reasoning, which are essential for realistically reflecting the reasoning demands in practical applications and completing data querying and analysis. To bridge this gap, we introduce LogicCat, the first Text-to-SQL benchmark dataset specifically designed for complex reasoning and chain-of-thought parsing, encompassing physics, arithmetic, commonsense, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsFocus
