Multimodal Urban Sound Tagging with Spatiotemporal Context

Jisheng Bai; Jianfeng Chen; Mou Wang

arXiv:2011.00175·eess.AS·November 22, 2023·IEEE Trans. Cogn. Dev. Syst.

Multimodal Urban Sound Tagging with Spatiotemporal Context

Jisheng Bai, Jianfeng Chen, Mou Wang

PDF

Open Access

TL;DR

This paper introduces a multimodal urban sound tagging system that integrates audio features with spatiotemporal context, significantly improving noise pollution monitoring accuracy in urban environments.

Contribution

The study presents a novel multimodal approach combining audio and spatiotemporal data, with a data filtering technique, to enhance urban sound tagging performance.

Findings

01

Effective integration of spatiotemporal context improves sound classification accuracy.

02

The proposed method outperforms previous approaches on the DCASE2020 UST challenge.

03

Data filtering enhances multi-modal learning effectiveness.

Abstract

Noise pollution significantly affects our daily life and urban development. Urban Sound Tagging (UST) has attracted much attention recently, which aims to analyze and monitor urban noise pollution. One weakness of the previous UST studies is that the spatial and temporal context of sound signals, which contains complementary information about when and where the audio data was recorded, has not been investigated. To address this problem, in this paper, we propose a multimodal UST system that deeply mines the audio and spatiotemporal context together. In order to incorporate characteristics of different acoustic features, two sets of four spectrograms are first extracted as the inputs of residual neural networks. Then, the spatiotemporal context is encoded and combined with acoustic features to explore the efficiency of multimodal learning for discriminating sound signals. Moreover, a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNoise Effects and Management · Music and Audio Processing · Speech and Audio Processing