Audio-visual Representation Learning for Anomaly Events Detection in Crowds
Junyu Gao, Maoguo Gong, Xuelong Li

TL;DR
This paper proposes a multi-modal audio-visual learning approach using a two-branch neural network to improve anomaly event detection in crowds, demonstrating that incorporating audio signals enhances detection accuracy over visual-only methods.
Contribution
It introduces a novel two-branch network combining visual and audio features for crowd anomaly detection, outperforming existing state-of-the-art methods on the SHADE dataset.
Findings
Audio signals significantly improve detection accuracy.
The proposed method outperforms existing approaches.
Fusion of audio and visual features enhances robustness.
Abstract
In recent years, anomaly events detection in crowd scenes attracts many researchers' attention, because of its importance to public safety. Existing methods usually exploit visual information to analyze whether any abnormal events have occurred due to only visual sensors are generally equipped in public places. However, when an abnormal event in crowds occurs, sound information may be discriminative to assist the crowd analysis system to determine whether there is an abnormality. Compare with vision information that is easily occluded, audio signals have a certain degree of penetration. Thus, this paper attempt to exploit multi-modal learning for modeling the audio and visual signals simultaneously. To be specific, we design a two-branch network to model different types of information. The first is a typical 3D CNN model to extract temporal appearance features from video clips. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Music and Audio Processing · Video Surveillance and Tracking Methods
Methods3 Dimensional Convolutional Neural Network
