SLICER: Learning universal audio representations using low-resource   self-supervised pre-training

Ashish Seth; Sreyan Ghosh; S. Umesh; Dinesh Manocha

arXiv:2211.01519·eess.AS·May 19, 2023

SLICER: Learning universal audio representations using low-resource self-supervised pre-training

Ashish Seth, Sreyan Ghosh, S. Umesh, Dinesh Manocha

PDF

Open Access 1 Repo

TL;DR

SLICER introduces a novel self-supervised learning method combining clustering and contrasting paradigms to pre-train audio encoders, enabling effective low-resource audio and speech classification.

Contribution

It proposes SLICER, a new SSL approach that integrates instance and cluster-level contrastive learning with a novel augmentation, achieving state-of-the-art results on audio benchmarks.

Findings

01

Outperforms prior methods on the LAPE Benchmark

02

Requires significantly less unlabeled data for pre-training

03

Introduces a new augmentation technique, k-mix

Abstract

We present a new Self-Supervised Learning (SSL) approach to pre-train encoders on unlabeled audio data that reduces the need for large amounts of labeled data for audio and speech classification. Our primary aim is to learn audio representations that can generalize across a large variety of speech and non-speech tasks in a low-resource un-labeled audio pre-training setting. Inspired by the recent success of clustering and contrasting learning paradigms for SSL-based speech representation learning, we propose SLICER (Symmetrical Learning of Instance and Cluster-level Efficient Representations), which brings together the best of both clustering and contrasting learning paradigms. We use a symmetric loss between latent representations from student and teacher encoders and simultaneously solve instance and cluster-level contrastive learning tasks. We obtain cluster representations online by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Sreyan88/LAPE
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing

MethodsContrastive Learning