AtomEval: Atomic Evaluation of Adversarial Claims in Fact Verification

Hongyi Cen

arXiv:2604.07967·cs.CL·April 28, 2026

AtomEval: Atomic Evaluation of Adversarial Claims in Fact Verification

Hongyi Cen

PDF

TL;DR

AtomEval is a new evaluation framework for fact verification that decomposes claims into atomic components and scores their validity, improving the detection of factual inconsistencies in adversarial rewrites.

Contribution

It introduces AtomEval with Atomic Validity Scoring to better evaluate adversarial claim rewrites by capturing factual validity beyond surface similarity.

Findings

01

AtomEval provides more reliable evaluation signals than standard metrics.

02

Stronger LLMs do not always produce more effective adversarial claims.

03

AtomEval reveals limitations in current adversarial evaluation practices.

Abstract

Adversarial claim rewriting is widely used to test fact-checking systems, but standard metrics fail to capture truth-conditional consistency and often label semantically corrupted rewrites as successful. We introduce AtomEval, a validity-aware evaluation framework that decomposes claims into subject-relation-object-modifier (SROM) atoms and scores adversarial rewrites with Atomic Validity Scoring (AVS), enabling detection of factual corruption beyond surface similarity. Experiments on the FEVER dataset across representative attack strategies and LLM generators show that AtomEval provides more reliable evaluation signals in our experiments. Using AtomEval, we further analyze LLM-based adversarial generators and observe that stronger models do not necessarily produce more effective adversarial claims under validity-aware evaluation, highlighting previously overlooked limitations in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.