Long-Form Information Alignment Evaluation Beyond Atomic Facts

Danna Zheng; Mirella Lapata; Jeff Z. Pan

arXiv:2505.15792·cs.CL·May 22, 2025

Long-Form Information Alignment Evaluation Beyond Atomic Facts

Danna Zheng, Mirella Lapata, Jeff Z. Pan

PDF

Open Access 1 Video

TL;DR

This paper introduces DoveScore, a new framework for evaluating long-form information alignment that models inter-fact relationships, outperforming existing methods and addressing vulnerabilities in current evaluators.

Contribution

We propose DoveScore, a novel evaluation framework that jointly verifies factual accuracy and event-order consistency, improving robustness over existing methods.

Findings

01

DoveScore outperforms existing fine-grained methods by over 8%.

02

MontageLie benchmark reveals vulnerabilities in current evaluators.

03

Current evaluators have AUC-ROC scores below 65% on the benchmark.

Abstract

Information alignment evaluators are vital for various NLG evaluation tasks and trustworthy LLM deployment, reducing hallucinations and enhancing user trust. Current fine-grained methods, like FactScore, verify facts individually but neglect inter-fact dependencies, enabling subtle vulnerabilities. In this work, we introduce MontageLie, a challenging benchmark that constructs deceptive narratives by "montaging" truthful statements without introducing explicit hallucinations. We demonstrate that both coarse-grained LLM-based evaluators and current fine-grained frameworks are susceptible to this attack, with AUC-ROC scores falling below 65%. To enable more robust fine-grained evaluation, we propose DoveScore, a novel framework that jointly verifies factual accuracy and event-order consistency. By modeling inter-fact relationships, DoveScore outperforms existing fine-grained methods by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Long-Form Information Alignment Evaluation Beyond Atomic Facts· underline

Taxonomy

TopicsMachine Learning in Materials Science