What Makes Audio Event Detection Harder than Classification?

Huy Phan; Philipp Koch; Fabrice Katzberg; Marco Maass; Radoslaw Mazur,; Ian McLoughlin; Alfred Mertins

arXiv:1612.09089·cs.SD·May 18, 2018

What Makes Audio Event Detection Harder than Classification?

Huy Phan, Philipp Koch, Fabrice Katzberg, Marco Maass, Radoslaw Mazur,, Ian McLoughlin, Alfred Mertins

PDF

TL;DR

This paper analyzes why audio event detection is more challenging than classification and introduces an improved detection pipeline with a verification step that significantly enhances detection performance across various models.

Contribution

It provides a detailed analysis of the challenges in audio event detection and proposes a verification-based pipeline that improves detection accuracy using high-quality classifiers.

Findings

01

Verification step improves detection performance

02

Consistent improvements across multiple detector-classifier combinations

03

Significant performance gains on ITC-Irst dataset

Abstract

There is a common observation that audio event classification is easier to deal with than detection. So far, this observation has been accepted as a fact and we lack of a careful analysis. In this paper, we reason the rationale behind this fact and, more importantly, leverage them to benefit the audio event detection task. We present an improved detection pipeline in which a verification step is appended to augment a detection system. This step employs a high-quality event classifier to postprocess the benign event hypotheses outputted by the detection system and reject false alarms. To demonstrate the effectiveness of the proposed pipeline, we implement and pair up different event detectors based on the most common detection schemes and various event classifiers, ranging from the standard bag-of-words model to the state-of-the-art bank-of-regressors one. Experimental results on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.