Aligning First, Then Fusing: A Novel Weakly Supervised Multimodal   Violence Detection Method

Wenping Jin; Li Zhu; Jing Sun

arXiv:2501.07496·cs.CV·March 17, 2025

Aligning First, Then Fusing: A Novel Weakly Supervised Multimodal Violence Detection Method

Wenping Jin, Li Zhu, Jing Sun

PDF

1 Repo

TL;DR

This paper introduces a novel weakly supervised multimodal violence detection method that aligns semantic features across modalities like audio, optical flow, and RGB to improve detection accuracy, achieving state-of-the-art results.

Contribution

The paper proposes a new multimodal semantic feature alignment technique that leverages modality discrepancies, enhancing weakly supervised violence detection performance.

Findings

01

Achieved 86.07% AP on XD-Violence dataset.

02

Effectively aligns modalities to exploit complementary information.

03

Outperforms existing weakly supervised methods.

Abstract

Weakly supervised violence detection refers to the technique of training models to identify violent segments in videos using only video-level labels. Among these approaches, multimodal violence detection, which integrates modalities such as audio and optical flow, holds great potential. Existing methods in this domain primarily focus on designing multimodal fusion models to address modality discrepancies. In contrast, we take a different approach; leveraging the inherent discrepancies across modalities in violence event representation to propose a novel multimodal semantic feature alignment method. This method sparsely maps the semantic features of local, transient, and less informative modalities ( such as audio and optical flow ) into the more informative RGB semantic feature space. Through an iterative process, the method identifies the suitable no-zero feature matching subspace and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xjpp2016/mavd
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsFocus