You Only Hear Once: A YOLO-like Algorithm for Audio Segmentation and Sound Event Detection
Satvik Venkatesh, David Moffat, Eduardo Reck Miranda

TL;DR
This paper introduces YOHO, a novel YOLO-inspired algorithm for audio segmentation and sound event detection that improves accuracy and inference speed by transforming the task into a regression problem.
Contribution
The paper presents a new end-to-end regression-based approach for audio event detection, reducing complexity and increasing speed compared to traditional classification methods.
Findings
YOHO achieves 1-6% higher F-measure than state-of-the-art models.
Inference speed is at least 6 times faster with YOHO.
Post-processing is approximately 7 times faster using YOHO.
Abstract
Audio segmentation and sound event detection are crucial topics in machine listening that aim to detect acoustic classes and their respective boundaries. It is useful for audio-content analysis, speech recognition, audio-indexing, and music information retrieval. In recent years, most research articles adopt segmentation-by-classification. This technique divides audio into small frames and individually performs classification on these frames. In this paper, we present a novel approach called You Only Hear Once (YOHO), which is inspired by the YOLO algorithm popularly adopted in Computer Vision. We convert the detection of acoustic boundaries into a regression problem instead of frame-based classification. This is done by having separate output neurons to detect the presence of an audio class and predict its start and end points. The relative improvement for F-measure of YOHO, compared…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Diverse Musicological Studies
MethodsYou Only Look Once · You Only Hypothesize Once
