MM-StanceDet: Retrieval-Augmented Multi-modal Multi-agent Stance Detection
Weihai Lu, Zhejun Zhao, Yanshu Li, Huan He

TL;DR
MM-StanceDet is a multi-agent framework that enhances multimodal stance detection by integrating retrieval augmentation, specialized analysis, debate reasoning, and self-reflection, significantly outperforming existing methods.
Contribution
It introduces a novel multi-agent architecture with structured reasoning stages and retrieval augmentation for improved multimodal stance detection.
Findings
Outperforms state-of-the-art baselines on five datasets.
Effectively handles conflicting signals in multimodal data.
Demonstrates robustness through structured reasoning and self-reflection.
Abstract
Multimodal Stance Detection (MSD) is crucial for understanding public discourse, yet effectively fusing text and image, especially with conflicting signals, remains challenging. Existing methods often face difficulties with contextual grounding, cross-modal interpretation ambiguity, and single-pass reasoning fragility. To address these, we propose Retrieval-Augmented Multi-modal Multi-agent Stance Detection (MM-StanceDet), a novel multi-agent framework integrating Retrieval Augmentation for contextual grounding, specialized Multimodal Analysis agents for nuanced interpretation, a Reasoning-Enhanced Debate stage for exploring perspectives, and Self-Reflection for robust adjudication. Extensive experiments on five datasets demonstrate MM-StanceDet significantly outperforms state-of-the-art baselines, validating the efficacy of its multi-agent architecture and structured reasoning stages…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
