Audio-visual Representation Learning for Anomaly Events Detection in   Crowds

Junyu Gao; Maoguo Gong; Xuelong Li

arXiv:2110.14862·cs.CV·October 29, 2021·6 cites

Audio-visual Representation Learning for Anomaly Events Detection in Crowds

Junyu Gao, Maoguo Gong, Xuelong Li

PDF

Open Access

TL;DR

This paper proposes a multi-modal audio-visual learning approach using a two-branch neural network to improve anomaly event detection in crowds, demonstrating that incorporating audio signals enhances detection accuracy over visual-only methods.

Contribution

It introduces a novel two-branch network combining visual and audio features for crowd anomaly detection, outperforming existing state-of-the-art methods on the SHADE dataset.

Findings

01

Audio signals significantly improve detection accuracy.

02

The proposed method outperforms existing approaches.

03

Fusion of audio and visual features enhances robustness.

Abstract

In recent years, anomaly events detection in crowd scenes attracts many researchers' attention, because of its importance to public safety. Existing methods usually exploit visual information to analyze whether any abnormal events have occurred due to only visual sensors are generally equipped in public places. However, when an abnormal event in crowds occurs, sound information may be discriminative to assist the crowd analysis system to determine whether there is an abnormality. Compare with vision information that is easily occluded, audio signals have a certain degree of penetration. Thus, this paper attempt to exploit multi-modal learning for modeling the audio and visual signals simultaneously. To be specific, we design a two-branch network to model different types of information. The first is a typical 3D CNN model to extract temporal appearance features from video clips. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Music and Audio Processing · Video Surveillance and Tracking Methods

Methods3 Dimensional Convolutional Neural Network