Eventness: Object Detection on Spectrograms for Temporal Localization of Audio Events
Phuong Pham, Juncheng Li, Joseph Szurley, Samarjit Das

TL;DR
This paper introduces 'Eventness', a concept analogous to Objectness in computer vision, for detecting audio events in spectrograms by treating them as objects, and adapts visual object detection models for this task.
Contribution
The paper proposes the novel concept of Eventness for audio event detection and adapts a visual object detection model to spectrogram analysis, showing comparable results and improved robustness.
Findings
Comparable results with state-of-the-art baselines
More robust detection of minority events
Effective adaptation of visual object detection models to audio spectrograms
Abstract
In this paper, we introduce the concept of Eventness for audio event detection, which can, in part, be thought of as an analogue to Objectness from computer vision. The key observation behind the eventness concept is that audio events reveal themselves as 2-dimensional time-frequency patterns with specific textures and geometric structures in spectrograms. These time-frequency patterns can then be viewed analogously to objects occurring in natural images (with the exception that scaling and rotation invariance properties do not apply). With this key observation in mind, we pose the problem of detecting monophonic or polyphonic audio events as an equivalent visual object(s) detection problem under partial occlusion and clutter in spectrograms. We adapt a state-of-the-art visual object detection model to evaluate the audio event detection task on publicly available datasets. The proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
