METER: Multi-modal Evidence-based Thinking and Explainable Reasoning -- Algorithm and Benchmark

Xu Yang; Qi Zhang; Shuming Jiang; Yaowen Xu; Zhaofan Zou; Hao Sun; Xuelong Li

arXiv:2507.16206·cs.LG·July 23, 2025

METER: Multi-modal Evidence-based Thinking and Explainable Reasoning -- Algorithm and Benchmark

Xu Yang, Qi Zhang, Shuming Jiang, Yaowen Xu, Zhaofan Zou, Hao Sun, Xuelong Li

PDF

Open Access

TL;DR

METER introduces a comprehensive multi-modal benchmark and a novel training strategy for interpretable forgery detection across images, videos, audio, and audiovisual content, addressing the need for detailed explanations and cross-modal analysis in synthetic media detection.

Contribution

The paper presents METER, a unified benchmark with rich interpretability metrics and a new Chain-of-Thought training approach for improved, explainable forgery detection across multiple media modalities.

Findings

01

METER covers four media modalities with detailed interpretability metrics.

02

The proposed CoT training strategy enhances detection accuracy and explanation quality.

03

METER sets a new standard for cross-modal, interpretable forgery detection benchmarks.

Abstract

With the rapid advancement of generative AI, synthetic content across images, videos, and audio has become increasingly realistic, amplifying the risk of misinformation. Existing detection approaches predominantly focus on binary classification while lacking detailed and interpretable explanations of forgeries, which limits their applicability in safety-critical scenarios. Moreover, current methods often treat each modality separately, without a unified benchmark for cross-modal forgery detection and interpretation. To address these challenges, we introduce METER, a unified, multi-modal benchmark for interpretable forgery detection spanning images, videos, audio, and audio-visual content. Our dataset comprises four tracks, each requiring not only real-vs-fake classification but also evidence-chain-based explanations, including spatio-temporal localization, textual rationales, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies