TabDSR: Decompose, Sanitize, and Reason for Complex Numerical Reasoning in Tabular Data
Changjiang Jiang, Fengchang Yu, Haihua Chen, Wei Lu, Jin Zeng

TL;DR
TabDSR is a novel framework that decomposes complex questions, sanitizes noisy tables, and uses program-of-thoughts reasoning to significantly improve large language models' performance on complex numerical reasoning tasks over tables.
Contribution
We introduce TabDSR, a comprehensive framework combining question decomposition, table sanitization, and program-of-thoughts reasoning, along with a new dataset CalTab151 for unbiased evaluation.
Findings
Achieves state-of-the-art accuracy improvements on TAT-QA, TableBench, and TabDSR datasets.
Effectively integrates with mainstream LLMs for complex numerical reasoning.
Demonstrates robustness and improved performance over existing methods.
Abstract
Complex reasoning over tabular data is crucial in real-world data analysis, yet large language models (LLMs) often underperform due to complex queries, noisy data, and limited numerical capabilities. To address these issues, we propose TabDSR, a framework consisting of: (1) a query decomposer that breaks down complex questions, (2) a table sanitizer that cleans and filters noisy tables, and (3) a program-of-thoughts (PoT)-based reasoner that generates executable code to derive the final answer from the sanitized table. To ensure unbiased evaluation and mitigate data leakage, we introduce a new dataset, CalTab151, specifically designed for complex numerical reasoning over tables. Experimental results demonstrate that TabDSR consistently outperforms existing methods, achieving state-of-the-art (SOTA) performance with 8.79%, 6.08%, and 19.87% accuracy improvement on TAT-QA, TableBench, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsData Quality and Management · Topic Modeling · Big Data and Digital Economy
