AudioPairBank: Towards A Large-Scale Tag-Pair-Based Audio Content   Analysis

Sebastian Sager; Benjamin Elizalde; Damian Borth; Christian; Schulze; Bhiksha Raj; Ian Lane

arXiv:1607.03766·cs.SD·January 10, 2018

AudioPairBank: Towards A Large-Scale Tag-Pair-Based Audio Content Analysis

Sebastian Sager, Benjamin Elizalde, Damian Borth, Christian, Schulze, Bhiksha Raj, Ian Lane

PDF

TL;DR

This paper introduces AudioPairBank, a large dataset of audio with adjective-noun and verb-noun pairs, and demonstrates its potential for nuanced sound recognition with a 70% accuracy benchmark.

Contribution

It provides the first dataset with adjective-noun and verb-noun labels for audio and analyzes their correlation with sound content.

Findings

01

Collected and processed 33,000+ audio files with 1,123 label pairs.

02

Achieved 70% accuracy in recognizing audio content with these labels.

03

Documented challenges and implications of collecting nuanced audio annotations.

Abstract

Recently, sound recognition has been used to identify sounds, such as car and river. However, sounds have nuances that may be better described by adjective-noun pairs such as slow car, and verb-noun pairs such as flying insects, which are under explored. Therefore, in this work we investigate the relation between audio content and both adjective-noun pairs and verb-noun pairs. Due to the lack of datasets with these kinds of annotations, we collected and processed the AudioPairBank corpus consisting of a combined total of 1,123 pairs and over 33,000 audio files. One contribution is the previously unavailable documentation of the challenges and implications of collecting audio recordings with these type of labels. A second contribution is to show the degree of correlation between the audio content and the labels through sound recognition experiments, which yielded results of 70% accuracy,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.