Training-Free and Interpretable Hateful Video Detection via Multi-stage Adversarial Reasoning
Shuonan Yang, Yuchen Zhang, Zeyu Fu

TL;DR
MARS is a training-free, multi-stage adversarial reasoning framework that enhances hateful video detection by providing reliable, interpretable, and explainable results, outperforming existing methods on real-world datasets.
Contribution
This paper introduces MARS, a novel training-free framework that combines evidence and counter-evidence reasoning for interpretable hateful video detection.
Findings
Achieves up to 10% improvement over other training-free methods.
Outperforms state-of-the-art training-based methods on one dataset.
Provides human-understandable justifications for decisions.
Abstract
Hateful videos pose serious risks by amplifying discrimination, inciting violence, and undermining online safety. Existing training-based hateful video detection methods are constrained by limited training data and lack of interpretability, while directly prompting large vision-language models often struggle to deliver reliable hate detection. To address these challenges, this paper introduces MARS, a training-free Multi-stage Adversarial ReaSoning framework that enables reliable and interpretable hateful content detection. MARS begins with the objective description of video content, establishing a neutral foundation for subsequent analysis. Building on this, it develops evidence-based reasoning that supports potential hateful interpretations, while in parallel incorporating counter-evidence reasoning to capture plausible non-hateful perspectives. Finally, these perspectives are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Generative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning
