Identifying Actions for Sound Event Classification

Benjamin Elizalde; Radu Revutchi; Samarjit Das; Bhiksha Raj; Ian Lane,; Laurie M. Heller

arXiv:2104.12693·cs.SD·August 9, 2021

Identifying Actions for Sound Event Classification

Benjamin Elizalde, Radu Revutchi, Samarjit Das, Bhiksha Raj, Ian Lane,, Laurie M. Heller

PDF

TL;DR

This paper introduces a psychology-inspired method for sound event classification that incorporates human-identified actions, creating semantic Action Vectors, which, when combined with audio features, significantly improve classification accuracy.

Contribution

It proposes a novel approach that integrates human action annotations into sound event classification, enhancing accuracy over traditional audio-only methods.

Findings

01

Achieved 88% classification accuracy by combining Action Vectors with audio features.

02

Crowdsourcing effectively identified actions related to sound events.

03

First use of human action annotations to improve sound event classification.

Abstract

In Psychology, actions are paramount for humans to identify sound events. In Machine Learning (ML), action recognition achieves high accuracy; however, it has not been asked whether identifying actions can benefit Sound Event Classification (SEC), as opposed to mapping the audio directly to a sound event. Therefore, we propose a new Psychology-inspired approach for SEC that includes identification of actions via human listeners. To achieve this goal, we used crowdsourcing to have listeners identify 20 actions that in isolation or in combination may have produced any of the 50 sound events in the well-studied dataset ESC-50. The resulting annotations for each audio recording relate actions to a database of sound events for the first time. The annotations were used to create semantic representations called Action Vectors (AVs). We evaluated SEC by comparing the AVs with two types of audio…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.