Contrastive Environmental Sound Representation Learning

Peter Ochieng; Dennis Kaburu

arXiv:2207.08825·cs.SD·July 20, 2022·1 cites

Contrastive Environmental Sound Representation Learning

Peter Ochieng, Dennis Kaburu

PDF

Open Access

TL;DR

This paper introduces a self-supervised contrastive learning approach using a shallow 1D CNN to extract robust environmental sound representations from raw audio and spectrograms, improving recognition accuracy.

Contribution

It proposes a novel contrastive learning method with multi-input fusion via CCA for environmental sound representation without annotations.

Findings

01

Achieved 12.8% improvement on ESC-50 dataset.

02

Achieved 0.9% improvement on UrbanSound8K dataset.

03

Demonstrated robustness of fused features over individual representations.

Abstract

Machine hearing of the environmental sound is one of the important issues in the audio recognition domain. It gives the machine the ability to discriminate between the different input sounds that guides its decision making. In this work we exploit the self-supervised contrastive technique and a shallow 1D CNN to extract the distinctive audio features (audio representations) without using any explicit annotations.We generate representations of a given audio using both its raw audio waveform and spectrogram and evaluate if the proposed learner is agnostic to the type of audio input. We further use canonical correlation analysis (CCA) to fuse representations from the two types of input of a given audio and demonstrate that the fused global feature results in robust representation of the audio signal as compared to the individual representations. The evaluation of the proposed technique is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Diverse Musicological Studies

Methods1-Dimensional Convolutional Neural Networks