Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision
Peng Wu, Jing Liu, Yujia Shi, Yujia Sun, Fangtao Shao, Zhaoyang Wu,, Zhiwei Yang

TL;DR
This paper introduces a large-scale multimodal violence detection dataset and a neural network that leverages audio-visual data and relation modeling to improve detection accuracy in untrimmed videos.
Contribution
The work provides a new multi-scene dataset and a novel neural network architecture that captures diverse relations among video snippets for weakly supervised violence detection.
Findings
Our method outperforms state-of-the-art on the new dataset.
Multimodal input significantly improves detection accuracy.
Relation modeling enhances the understanding of video context.
Abstract
Violence detection has been studied in computer vision for years. However, previous work are either superficial, e.g., classification of short-clips, and the single scenario, or undersupplied, e.g., the single modality, and hand-crafted features based multimodality. To address this problem, in this work we first release a large-scale and multi-scene dataset named XD-Violence with a total duration of 217 hours, containing 4754 untrimmed videos with audio signals and weak labels. Then we propose a neural network containing three parallel branches to capture different relations among video snippets and integrate features, where holistic branch captures long-range dependencies using similarity prior, localized branch captures local positional relation using proximity prior, and score branch dynamically captures the closeness of predicted score. Besides, our method also includes an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Video Surveillance and Tracking Methods
