Identifying Actions for Sound Event Classification
Benjamin Elizalde, Radu Revutchi, Samarjit Das, Bhiksha Raj, Ian Lane,, Laurie M. Heller

TL;DR
This paper introduces a psychology-inspired method for sound event classification that incorporates human-identified actions, creating semantic Action Vectors, which, when combined with audio features, significantly improve classification accuracy.
Contribution
It proposes a novel approach that integrates human action annotations into sound event classification, enhancing accuracy over traditional audio-only methods.
Findings
Achieved 88% classification accuracy by combining Action Vectors with audio features.
Crowdsourcing effectively identified actions related to sound events.
First use of human action annotations to improve sound event classification.
Abstract
In Psychology, actions are paramount for humans to identify sound events. In Machine Learning (ML), action recognition achieves high accuracy; however, it has not been asked whether identifying actions can benefit Sound Event Classification (SEC), as opposed to mapping the audio directly to a sound event. Therefore, we propose a new Psychology-inspired approach for SEC that includes identification of actions via human listeners. To achieve this goal, we used crowdsourcing to have listeners identify 20 actions that in isolation or in combination may have produced any of the 50 sound events in the well-studied dataset ESC-50. The resulting annotations for each audio recording relate actions to a database of sound events for the first time. The annotations were used to create semantic representations called Action Vectors (AVs). We evaluated SEC by comparing the AVs with two types of audio…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
