All That Glisters Is Not Gold: A Benchmark for Reference-Free Counterfactual Financial Misinformation Detection

Yuechen Jiang; Zhiwei Liu; Yupeng Cao; Yueru He; Ziyang Xu; Chen Xu; Zhiyang Deng; Prayag Tiwari; Xi Chen; Alejandro Lopez-Lira; Jimin Huang; Junichi Tsujii; Sophia Ananiadou

arXiv:2601.04160·cs.CL·January 12, 2026

All That Glisters Is Not Gold: A Benchmark for Reference-Free Counterfactual Financial Misinformation Detection

Yuechen Jiang, Zhiwei Liu, Yupeng Cao, Yueru He, Ziyang Xu, Chen Xu, Zhiyang Deng, Prayag Tiwari, Xi Chen, Alejandro Lopez-Lira, Jimin Huang, Junichi Tsujii, Sophia Ananiadou

PDF

Open Access 2 Datasets

TL;DR

This paper presents RFC Bench, a new benchmark for evaluating large language models' ability to detect financial misinformation without external references, highlighting current models' weaknesses in maintaining coherent beliefs in complex news contexts.

Contribution

Introduces RFC Bench, a novel benchmark for reference-free financial misinformation detection, emphasizing the importance of contextual reasoning and exposing current model limitations.

Findings

01

Models perform better with comparative context.

02

Reference-free detection shows significant weaknesses.

03

Models struggle to maintain coherent beliefs without external grounding.

Abstract

We introduce RFC Bench, a benchmark for evaluating large language models on financial misinformation under realistic news. RFC Bench operates at the paragraph level and captures the contextual complexity of financial news where meaning emerges from dispersed cues. The benchmark defines two complementary tasks: reference free misinformation detection and comparison based diagnosis using paired original perturbed inputs. Experiments reveal a consistent pattern: performance is substantially stronger when comparative context is available, while reference free settings expose significant weaknesses, including unstable predictions and elevated invalid outputs. These results indicate that current models struggle to maintain coherent belief states without external grounding. By highlighting this gap, RFC Bench provides a structured testbed for studying reference free reasoning and advancing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts · Explainable Artificial Intelligence (XAI) · Benford’s Law and Fraud Detection