Evaluating robustness of You Only Hear Once(YOHO) Algorithm on noisy audios in the VOICe Dataset
Soham Tiwari, Kshitiz Lakhotia, Manjunath Mulimani

TL;DR
This paper evaluates the robustness of the YOHO sound event detection algorithm on noisy audio data from the VOICe dataset, demonstrating its competitive performance and faster inference times compared to state-of-the-art methods.
Contribution
It introduces the application of the YOHO algorithm to noisy audio data and assesses its performance on the VOICe dataset, highlighting its efficiency and robustness.
Findings
YOHO outperforms or matches state-of-the-art algorithms on VOICe dataset.
YOHO achieves faster inference times than comparable methods.
YOHO maintains robustness across different SNR levels.
Abstract
Sound event detection (SED) in machine listening entails identifying the different sounds in an audio file and identifying the start and end time of a particular sound event in the audio. SED finds use in various applications such as audio surveillance, speech recognition, and context-based indexing and retrieval of data in a multimedia database. However, in real-life scenarios, the audios from various sources are seldom devoid of any interfering noise or disturbance. In this paper, we test the performance of the You Only Hear Once (YOHO) algorithm on noisy audio data. Inspired by the You Only Look Once (YOLO) algorithm in computer vision, the YOHO algorithm can match the performance of the various state-of-the-art algorithms on datasets such as Music Speech Detection Dataset, TUT Sound Event, and Urban-SED datasets but at lower inference times. In this paper, we explore the performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis
MethodsYou Only Hypothesize Once
