SCITAB: A Challenging Benchmark for Compositional Reasoning and Claim   Verification on Scientific Tables

Xinyuan Lu; Liangming Pan; Qian Liu; Preslav Nakov; Min-Yen Kan

arXiv:2305.13186·cs.CL·October 24, 2023·2 cites

SCITAB: A Challenging Benchmark for Compositional Reasoning and Claim Verification on Scientific Tables

Xinyuan Lu, Liangming Pan, Qian Liu, Preslav Nakov, Min-Yen Kan

PDF

Open Access 1 Repo

TL;DR

SCITAB is a new, challenging benchmark dataset for scientific claim verification that emphasizes compositional reasoning with scientific tables, revealing limitations of current models including large language models.

Contribution

We introduce SCITAB, a novel dataset that challenges models to verify scientific claims using tables, highlighting gaps in current AI reasoning capabilities.

Findings

01

Most models perform barely above random chance on SCITAB.

02

GPT-4 is the only model significantly better than other models.

03

Techniques like Chain-of-Thought do not substantially improve performance.

Abstract

Current scientific fact-checking benchmarks exhibit several shortcomings, such as biases arising from crowd-sourced claims and an over-reliance on text-based evidence. We present SCITAB, a challenging evaluation dataset consisting of 1.2K expert-verified scientific claims that 1) originate from authentic scientific publications and 2) require compositional reasoning for verification. The claims are paired with evidence-containing scientific tables annotated with labels. Through extensive evaluations, we demonstrate that SCITAB poses a significant challenge to state-of-the-art models, including table-based pretraining models and large language models. All models except GPT-4 achieved performance barely above random guessing. Popular prompting techniques, such as Chain-of-Thought, do not achieve much performance gains on SCITAB. Our analysis uncovers several unique challenges posed by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xinyuanlu00/scitab
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Topic Modeling · Natural Language Processing Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Position-Wise Feed-Forward Layer · Dense Connections · Residual Connection · Absolute Position Encodings · Adam · Byte Pair Encoding