Evaluating robustness of You Only Hear Once(YOHO) Algorithm on noisy   audios in the VOICe Dataset

Soham Tiwari; Kshitiz Lakhotia; Manjunath Mulimani

arXiv:2111.01205·cs.SD·November 3, 2021·1 cites

Evaluating robustness of You Only Hear Once(YOHO) Algorithm on noisy audios in the VOICe Dataset

Soham Tiwari, Kshitiz Lakhotia, Manjunath Mulimani

PDF

Open Access 1 Repo

TL;DR

This paper evaluates the robustness of the YOHO sound event detection algorithm on noisy audio data from the VOICe dataset, demonstrating its competitive performance and faster inference times compared to state-of-the-art methods.

Contribution

It introduces the application of the YOHO algorithm to noisy audio data and assesses its performance on the VOICe dataset, highlighting its efficiency and robustness.

Findings

01

YOHO outperforms or matches state-of-the-art algorithms on VOICe dataset.

02

YOHO achieves faster inference times than comparable methods.

03

YOHO maintains robustness across different SNR levels.

Abstract

Sound event detection (SED) in machine listening entails identifying the different sounds in an audio file and identifying the start and end time of a particular sound event in the audio. SED finds use in various applications such as audio surveillance, speech recognition, and context-based indexing and retrieval of data in a multimedia database. However, in real-life scenarios, the audios from various sources are seldom devoid of any interfering noise or disturbance. In this paper, we test the performance of the You Only Hear Once (YOHO) algorithm on noisy audio data. Inspired by the You Only Look Once (YOLO) algorithm in computer vision, the YOHO algorithm can match the performance of the various state-of-the-art algorithms on datasets such as Music Speech Detection Dataset, TUT Sound Event, and Urban-SED datasets but at lower inference times. In this paper, we explore the performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sohamtiwari3120/yoho-on-voice
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis

MethodsYou Only Hypothesize Once