MM-StanceDet: Retrieval-Augmented Multi-modal Multi-agent Stance Detection

Weihai Lu; Zhejun Zhao; Yanshu Li; Huan He

arXiv:2604.27934·cs.AI·May 1, 2026

MM-StanceDet: Retrieval-Augmented Multi-modal Multi-agent Stance Detection

Weihai Lu, Zhejun Zhao, Yanshu Li, Huan He

PDF

TL;DR

MM-StanceDet is a multi-agent framework that enhances multimodal stance detection by integrating retrieval augmentation, specialized analysis, debate reasoning, and self-reflection, significantly outperforming existing methods.

Contribution

It introduces a novel multi-agent architecture with structured reasoning stages and retrieval augmentation for improved multimodal stance detection.

Findings

01

Outperforms state-of-the-art baselines on five datasets.

02

Effectively handles conflicting signals in multimodal data.

03

Demonstrates robustness through structured reasoning and self-reflection.

Abstract

Multimodal Stance Detection (MSD) is crucial for understanding public discourse, yet effectively fusing text and image, especially with conflicting signals, remains challenging. Existing methods often face difficulties with contextual grounding, cross-modal interpretation ambiguity, and single-pass reasoning fragility. To address these, we propose Retrieval-Augmented Multi-modal Multi-agent Stance Detection (MM-StanceDet), a novel multi-agent framework integrating Retrieval Augmentation for contextual grounding, specialized Multimodal Analysis agents for nuanced interpretation, a Reasoning-Enhanced Debate stage for exploring perspectives, and Self-Reflection for robust adjudication. Extensive experiments on five datasets demonstrate MM-StanceDet significantly outperforms state-of-the-art baselines, validating the efficacy of its multi-agent architecture and structured reasoning stages…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.