SONYC-UST-V2: An Urban Sound Tagging Dataset with Spatiotemporal Context
Mark Cartwright, Jason Cramer, Ana Elisa Mendez Mendez, Yu Wang,, Ho-Hsiang Wu, Vincent Lostanlen, Magdalena Fuentes, Graham Dove, Charlie, Mydlarz, Justin Salamon, Oded Nov, and Juan Pablo Bello

TL;DR
SONYC-UST-V2 is a comprehensive urban sound tagging dataset with spatiotemporal metadata, enabling improved machine listening models for real-world noise monitoring in cities.
Contribution
The paper introduces SONYC-UST-V2, a new dataset with spatiotemporal annotations and baseline evaluation for urban sound tagging.
Findings
Spatiotemporal data improves sound tag prediction accuracy.
The dataset includes 18,510 recordings with verified annotations.
Baseline models show potential for urban noise monitoring.
Abstract
We present SONYC-UST-V2, a dataset for urban sound tagging with spatiotemporal information. This dataset is aimed for the development and evaluation of machine listening systems for real-world urban noise monitoring. While datasets of urban recordings are available, this dataset provides the opportunity to investigate how spatiotemporal metadata can aid in the prediction of urban sound tags. SONYC-UST-V2 consists of 18510 audio recordings from the "Sounds of New York City" (SONYC) acoustic sensor network, including the timestamp of audio acquisition and location of the sensor. The dataset contains annotations by volunteers from the Zooniverse citizen science platform, as well as a two-stage verification with our team. In this article, we describe our data collection procedure and propose evaluation metrics for multilabel classification of urban sound tags. We report the results of a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Animal Vocal Communication and Behavior · Speech and Audio Processing
