LogicCat: A Chain-of-Thought Text-to-SQL Benchmark for Complex Reasoning

Tao Liu; Xutao Mao; Hongying Zan; Dixuan Zhang; Yifan Li; Haixin Liu; Lulu Kong; Jiaming Hou; Rui Li; YunLong Li; aoze zheng; Zhiqiang Zhang; Luo Zhewei; Kunli Zhang; Min Peng

arXiv:2505.18744·cs.CL·September 10, 2025

LogicCat: A Chain-of-Thought Text-to-SQL Benchmark for Complex Reasoning

Tao Liu, Xutao Mao, Hongying Zan, Dixuan Zhang, Yifan Li, Haixin Liu, Lulu Kong, Jiaming Hou, Rui Li, YunLong Li, aoze zheng, Zhiqiang Zhang, Luo Zhewei, Kunli Zhang, Min Peng

PDF

TL;DR

LogicCat introduces a new complex reasoning benchmark for Text-to-SQL tasks, emphasizing domain knowledge, mathematical, and hypothetical reasoning to better reflect real-world data querying challenges.

Contribution

This paper presents the first dataset specifically designed for complex reasoning in Text-to-SQL, including chain-of-thought steps across diverse domains, surpassing existing datasets in complexity.

Findings

01

State-of-the-art models achieve at most 33.20% accuracy on LogicCat.

02

LogicCat significantly increases task difficulty for current models.

03

The dataset covers physics, arithmetic, commonsense, and hypothetical reasoning.

Abstract

Text-to-SQL is a critical task in natural language processing that aims to transform natural language questions into accurate and executable SQL queries. In real-world scenarios, these reasoning tasks are often accompanied by complex mathematical computations, domain knowledge, and hypothetical reasoning scenarios. However, existing large-scale Text-to-SQL datasets typically focus on business logic and task logic, neglecting critical factors such as vertical domain knowledge, complex mathematical reasoning, and hypothetical reasoning, which are essential for realistically reflecting the reasoning demands in practical applications and completing data querying and analysis. To bridge this gap, we introduce LogicCat, the first Text-to-SQL benchmark dataset specifically designed for complex reasoning and chain-of-thought parsing, encompassing physics, arithmetic, commonsense, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsFocus