Acoustic Scene Classification
Daniele Barchiesi, Dimitrios Giannoulis, Dan Stowell, Mark D. Plumbley

TL;DR
This paper reviews the state-of-the-art in acoustic scene classification, introduces a benchmark dataset and evaluation framework, compares algorithms including human performance, and identifies significant improvements over baseline methods.
Contribution
It provides a comprehensive framework, benchmark dataset, and evaluation metrics for acoustic scene classification, along with a comparison of algorithms and human accuracy.
Findings
Some algorithms outperform the baseline significantly.
The best algorithm matches human median accuracy.
All scenes are correctly classified by at least some individuals.
Abstract
In this article we present an account of the state-of-the-art in acoustic scene classification (ASC), the task of classifying environments from the sounds they produce. Starting from a historical review of previous research in this area, we define a general framework for ASC and present different imple- mentations of its components. We then describe a range of different algorithms submitted for a data challenge that was held to provide a general and fair benchmark for ASC techniques. The dataset recorded for this purpose is presented, along with the performance metrics that are used to evaluate the algorithms and statistical significance tests to compare the submitted methods. We use a baseline method that employs MFCCS, GMMS and a maximum likelihood criterion as a benchmark, and only find sufficient evidence to conclude that three algorithms significantly outperform it. We also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
