SciTaRC: Benchmarking QA on Scientific Tabular Data that Requires Language Reasoning and Complex Computation

Hexuan Wang; Yaxuan Ren; Srikar Bommireddypalli; Shuxian Chen; Adarsh Prabhudesai; Rongkun Zhou; Elina Baral; Philipp Koehn

arXiv:2603.08910·cs.CL·March 11, 2026

SciTaRC: Benchmarking QA on Scientific Tabular Data that Requires Language Reasoning and Complex Computation

Hexuan Wang, Yaxuan Ren, Srikar Bommireddypalli, Shuxian Chen, Adarsh Prabhudesai, Rongkun Zhou, Elina Baral, Philipp Koehn

PDF

Open Access

TL;DR

SciTaRC is a benchmark for scientific question answering on tables that tests deep language reasoning and complex computation, revealing significant performance gaps in current AI models.

Contribution

The paper introduces SciTaRC, a new benchmark dataset for scientific table question answering that highlights the limitations of existing models in language reasoning and computation.

Findings

01

State-of-the-art models fail on at least 23% of questions.

02

Llama-3.3-70B-Instruct fails on 65.5% of tasks.

03

Both code and language models face an execution bottleneck.

Abstract

We introduce SciTaRC, an expert-authored benchmark of questions about tabular data in scientific papers requiring both deep language reasoning and complex computation. We show that current state-of-the-art AI models fail on at least 23% of these questions, a gap that remains significant even for highly capable open-weight models like Llama-3.3-70B-Instruct, which fails on 65.5% of the tasks. Our analysis reveals a universal "execution bottleneck": both code and language models struggle to faithfully execute plans, even when provided with correct strategies. Specifically, code-based methods prove brittle on raw scientific tables, while natural language reasoning primarily fails due to initial comprehension issues and calculation errors.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Scientific Computing and Data Management · Natural Language Processing Techniques