A Framework for Generative and Contrastive Learning of Audio   Representations

Prateek Verma; Julius Smith

arXiv:2010.11459·cs.SD·March 18, 2021·1 cites

A Framework for Generative and Contrastive Learning of Audio Representations

Prateek Verma, Julius Smith

PDF

Open Access

TL;DR

This paper introduces a self-supervised framework combining contrastive and generative transformer models to learn effective audio representations without labeled data, showing promising results in audio understanding tasks.

Contribution

It presents a novel combined approach of contrastive and generative self-supervised learning for audio representations, eliminating the need for ground truth labels.

Findings

01

Achieves competitive performance compared to supervised methods.

02

Demonstrates effectiveness of contrastive learning for audio.

03

Shows promise of transformer-based generative models for audio understanding.

Abstract

In this paper, we present a framework for contrastive learning for audio representations, in a self supervised frame work without access to any ground truth labels. The core idea in self supervised contrastive learning is to map an audio signal and its various augmented versions (representative of salient aspects of audio like pitch, timbre etc.) to a space where they are close together, and are separated from other different signals. In addition we also explore generative models based on state of the art transformer based architectures for learning latent spaces for audio signals, without access to any labels. Here, we map audio signals on a smaller scale to discrete dictionary elements and train transformers to predict the next dictionary element. We only use data as a method of supervision, bypassing the need of labels needed to act as a supervision for training the deep neural…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies

MethodsContrastive Learning