A Framework for Generative and Contrastive Learning of Audio Representations
Prateek Verma, Julius Smith

TL;DR
This paper introduces a self-supervised framework combining contrastive and generative transformer models to learn effective audio representations without labeled data, showing promising results in audio understanding tasks.
Contribution
It presents a novel combined approach of contrastive and generative self-supervised learning for audio representations, eliminating the need for ground truth labels.
Findings
Achieves competitive performance compared to supervised methods.
Demonstrates effectiveness of contrastive learning for audio.
Shows promise of transformer-based generative models for audio understanding.
Abstract
In this paper, we present a framework for contrastive learning for audio representations, in a self supervised frame work without access to any ground truth labels. The core idea in self supervised contrastive learning is to map an audio signal and its various augmented versions (representative of salient aspects of audio like pitch, timbre etc.) to a space where they are close together, and are separated from other different signals. In addition we also explore generative models based on state of the art transformer based architectures for learning latent spaces for audio signals, without access to any labels. Here, we map audio signals on a smaller scale to discrete dictionary elements and train transformers to predict the next dictionary element. We only use data as a method of supervision, bypassing the need of labels needed to act as a supervision for training the deep neural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
MethodsContrastive Learning
