MAD-Fact: A Multi-Agent Debate Framework for Long-Form Factuality Evaluation in LLMs

Yucheng Ning; Xixun Lin; Fang Fang; Yanan Cao

arXiv:2510.22967·cs.CL·October 30, 2025

MAD-Fact: A Multi-Agent Debate Framework for Long-Form Factuality Evaluation in LLMs

Yucheng Ning, Xixun Lin, Fang Fang, Yanan Cao

PDF

TL;DR

This paper introduces MAD-Fact, a multi-agent debate framework designed to evaluate and improve the factual accuracy of long-form outputs from large language models, addressing challenges in high-stakes domains.

Contribution

It presents a novel debate-based verification system and a long-form factuality dataset, advancing evaluation methods for complex, long-text LLM outputs.

Findings

01

Larger LLMs tend to have higher factual consistency.

02

Domestic models perform better on Chinese long-form content.

03

The framework effectively identifies factual inaccuracies in long-form texts.

Abstract

The widespread adoption of Large Language Models (LLMs) raises critical concerns about the factual accuracy of their outputs, especially in high-risk domains such as biomedicine, law, and education. Existing evaluation methods for short texts often fail on long-form content due to complex reasoning chains, intertwined perspectives, and cumulative information. To address this, we propose a systematic approach integrating large-scale long-form datasets, multi-agent verification mechanisms, and weighted evaluation metrics. We construct LongHalluQA, a Chinese long-form factuality dataset; and develop MAD-Fact, a debate-based multi-agent verification system. We introduce a fact importance hierarchy to capture the varying significance of claims in long-form texts. Experiments on two benchmarks show that larger LLMs generally maintain higher factual consistency, while domestic models excel on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.