Ambient Sound Helps: Audiovisual Crowd Counting in Extreme Conditions
Di Hu, Lichao Mou, Qingzhong Wang, Junyu Gao, Yuansheng Hua, Dejing, Dou, Xiao Xiang Zhu

TL;DR
This paper introduces audiovisual crowd counting, combining visual and auditory data to improve accuracy in extreme conditions like low light and occlusion, supported by a new large-scale dataset and fusion method.
Contribution
It presents a novel audiovisual crowd counting task, a large-scale dataset (DISCO), and a linear feature-wise fusion module for integrating audio and visual features.
Findings
Auditory information improves crowd counting accuracy in challenging conditions.
The proposed fusion method effectively combines audio and visual features.
Experimental results demonstrate the benefit of audiovisual data over visual-only approaches.
Abstract
Visual crowd counting has been recently studied as a way to enable people counting in crowd scenes from images. Albeit successful, vision-based crowd counting approaches could fail to capture informative features in extreme conditions, e.g., imaging at night and occlusion. In this work, we introduce a novel task of audiovisual crowd counting, in which visual and auditory information are integrated for counting purposes. We collect a large-scale benchmark, named auDiovISual Crowd cOunting (DISCO) dataset, consisting of 1,935 images and the corresponding audio clips, and 170,270 annotated instances. In order to fuse the two modalities, we make use of a linear feature-wise fusion module that carries out an affine transformation on visual and auditory features. Finally, we conduct extensive experiments using the proposed dataset and approach. Experimental results show that introducing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Anomaly Detection Techniques and Applications · Image Enhancement Techniques
