Tool-MAD: A Multi-Agent Debate Framework for Fact Verification with Diverse Tool Augmentation and Adaptive Retrieval

Seyeon Jeong; Yeonjun Choi; JongWook Kim; Beakcheol Jang

arXiv:2601.04742·cs.CL·January 9, 2026

Tool-MAD: A Multi-Agent Debate Framework for Fact Verification with Diverse Tool Augmentation and Adaptive Retrieval

Seyeon Jeong, Yeonjun Choi, JongWook Kim, Beakcheol Jang

PDF

Open Access

TL;DR

Tool-MAD is a multi-agent debate framework that enhances fact verification by integrating diverse external tools, adaptive retrieval, and quantitative hallucination detection, leading to improved accuracy and robustness across benchmarks and domains.

Contribution

It introduces a novel multi-agent debate system with heterogeneous external tools, adaptive query refinement, and quantitative hallucination detection, advancing the state-of-the-art in fact verification.

Findings

01

Achieves up to 5.5% accuracy improvement over previous MAD frameworks.

02

Demonstrates robustness and adaptability in medical domain fact verification.

03

Outperforms existing methods on four fact verification benchmarks.

Abstract

Large Language Models (LLMs) suffer from hallucinations and factual inaccuracies, especially in complex reasoning and fact verification tasks. Multi-Agent Debate (MAD) systems aim to improve answer accuracy by enabling multiple LLM agents to engage in dialogue, promoting diverse reasoning and mutual verification. However, existing MAD frameworks primarily rely on internal knowledge or static documents, making them vulnerable to hallucinations. While MADKE introduces external evidence to mitigate this, its one-time retrieval mechanism limits adaptability to new arguments or emerging information during the debate. To address these limitations, We propose Tool-MAD, a multi-agent debate framework that enhances factual verification by assigning each agent a distinct external tool, such as a search API or RAG module. Tool-MAD introduces three key innovations: (1) a multi-agent debate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Misinformation and Its Impacts · Multimodal Machine Learning Applications