Tool-MAD: A Multi-Agent Debate Framework for Fact Verification with Diverse Tool Augmentation and Adaptive Retrieval
Seyeon Jeong, Yeonjun Choi, JongWook Kim, Beakcheol Jang

TL;DR
Tool-MAD is a multi-agent debate framework that enhances fact verification by integrating diverse external tools, adaptive retrieval, and quantitative hallucination detection, leading to improved accuracy and robustness across benchmarks and domains.
Contribution
It introduces a novel multi-agent debate system with heterogeneous external tools, adaptive query refinement, and quantitative hallucination detection, advancing the state-of-the-art in fact verification.
Findings
Achieves up to 5.5% accuracy improvement over previous MAD frameworks.
Demonstrates robustness and adaptability in medical domain fact verification.
Outperforms existing methods on four fact verification benchmarks.
Abstract
Large Language Models (LLMs) suffer from hallucinations and factual inaccuracies, especially in complex reasoning and fact verification tasks. Multi-Agent Debate (MAD) systems aim to improve answer accuracy by enabling multiple LLM agents to engage in dialogue, promoting diverse reasoning and mutual verification. However, existing MAD frameworks primarily rely on internal knowledge or static documents, making them vulnerable to hallucinations. While MADKE introduces external evidence to mitigate this, its one-time retrieval mechanism limits adaptability to new arguments or emerging information during the debate. To address these limitations, We propose Tool-MAD, a multi-agent debate framework that enhances factual verification by assigning each agent a distinct external tool, such as a search API or RAG module. Tool-MAD introduces three key innovations: (1) a multi-agent debate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Misinformation and Its Impacts · Multimodal Machine Learning Applications
