The bag-of-frames approach: a not so sufficient model for urban soundscapes
Mathieu Lagrange (IRCCyN), Gr\'egoire Lafay (IRCCyN), Boris, Defreville, Jean-Julien Aucouturier

TL;DR
This paper critically evaluates the bag-of-frames approach for urban soundscape modeling, revealing its limitations on realistic datasets and suggesting a need to focus on individual acoustical events for better representation.
Contribution
It provides a conceptual replication of the BOF approach with new datasets, demonstrating its inadequacy and proposing a shift towards event-based analysis.
Findings
BOF performs poorly on realistic datasets
Original high accuracy was dataset-specific
Event-based modeling may improve soundscape representation
Abstract
The "bag-of-frames" approach (BOF), which encodes audio signals as the long-term statistical distribution of short-term spectral features, is commonly regarded as an effective and sufficient way to represent environmental sound recordings (soundscapes) since its introduction in an influential 2007 article. The present paper describes a concep-tual replication of this seminal article using several new soundscape datasets, with results strongly questioning the adequacy of the BOF approach for the task. We show that the good accuracy originally re-ported with BOF likely result from a particularly thankful dataset with low within-class variability, and that for more realistic datasets, BOF in fact does not perform significantly better than a mere one-point av-erage of the signal's features. Soundscape modeling, therefore, may not be the closed case it was once thought to be. Progress, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
