TSVer: A Benchmark for Fact Verification Against Time-Series Evidence

Marek Strong; Andreas Vlachos

arXiv:2511.01101·cs.CL·April 21, 2026

TSVer: A Benchmark for Fact Verification Against Time-Series Evidence

Marek Strong, Andreas Vlachos

PDF

1 Video

TL;DR

TSVer introduces a comprehensive benchmark dataset for fact verification involving temporal and numerical reasoning with real-world time-series evidence, addressing limitations of existing datasets.

Contribution

The paper presents TSVer, a new dataset with annotated claims and evidence, and establishes baseline performance for fact verification using time-series data.

Findings

01

Achieved an inter-annotator agreement of kappa=0.77 on verdicts.

02

State-of-the-art models like Gemini-2.5-Pro reach 63.57% accuracy on verdicts.

03

Models struggle with time-series evidence, indicating room for improvement.

Abstract

Reasoning over temporal and numerical data, such as time series, is a crucial aspect of fact-checking. While many systems have recently been developed to handle this form of evidence, their evaluation remains limited by existing datasets, which often lack structured evidence, provide insufficient justifications for verdicts, or rely on synthetic claims. In this paper, we introduce TSVer, a new benchmark dataset for fact verification focusing on temporal and numerical reasoning with time-series evidence. TSVer contains 304 real-world claims sourced from 41 fact-checking organizations and a curated database of 400 time series covering diverse domains. Each claim is annotated with time frames across all pertinent time series, along with a verdict and justifications reflecting how the evidence is used to reach the verdict. Using an LLM-assisted multi-step annotation process, we improve the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

TSVer: A Benchmark for Fact Verification Against Time-Series Evidence· underline