FACT-AUDIT: An Adaptive Multi-Agent Framework for Dynamic Fact-Checking Evaluation of Large Language Models
Hongzhan Lin, Yang Deng, Yuxuan Gu, Wenxuan Zhang, Jing Ma, See-Kiong, Ng, Tat-Seng Chua

TL;DR
FACT-AUDIT is a dynamic, multi-agent framework that adaptively evaluates large language models' fact-checking abilities, including justification quality, providing a comprehensive and evolving assessment of their trustworthiness.
Contribution
This work introduces FACT-AUDIT, a novel adaptive, multi-agent framework that assesses LLMs' fact-checking performance beyond static datasets by incorporating justification analysis and iterative evaluation.
Findings
Effectively differentiates among state-of-the-art LLMs.
Provides insights into models' strengths and limitations.
Enhances fact-checking evaluation with dynamic, model-centric assessments.
Abstract
Large Language Models (LLMs) have significantly advanced the fact-checking studies. However, existing automated fact-checking evaluation methods rely on static datasets and classification metrics, which fail to automatically evaluate the justification production and uncover the nuanced limitations of LLMs in fact-checking. In this work, we introduce FACT-AUDIT, an agent-driven framework that adaptively and dynamically assesses LLMs' fact-checking capabilities. Leveraging importance sampling principles and multi-agent collaboration, FACT-AUDIT generates adaptive and scalable datasets, performs iterative model-centric evaluations, and updates assessments based on model-specific responses. By incorporating justification production alongside verdict prediction, this framework provides a comprehensive and evolving audit of LLMs' factual reasoning capabilities, to investigate their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling
