SLICER: Learning universal audio representations using low-resource self-supervised pre-training
Ashish Seth, Sreyan Ghosh, S. Umesh, Dinesh Manocha

TL;DR
SLICER introduces a novel self-supervised learning method combining clustering and contrasting paradigms to pre-train audio encoders, enabling effective low-resource audio and speech classification.
Contribution
It proposes SLICER, a new SSL approach that integrates instance and cluster-level contrastive learning with a novel augmentation, achieving state-of-the-art results on audio benchmarks.
Findings
Outperforms prior methods on the LAPE Benchmark
Requires significantly less unlabeled data for pre-training
Introduces a new augmentation technique, k-mix
Abstract
We present a new Self-Supervised Learning (SSL) approach to pre-train encoders on unlabeled audio data that reduces the need for large amounts of labeled data for audio and speech classification. Our primary aim is to learn audio representations that can generalize across a large variety of speech and non-speech tasks in a low-resource un-labeled audio pre-training setting. Inspired by the recent success of clustering and contrasting learning paradigms for SSL-based speech representation learning, we propose SLICER (Symmetrical Learning of Instance and Cluster-level Efficient Representations), which brings together the best of both clustering and contrasting learning paradigms. We use a symmetric loss between latent representations from student and teacher encoders and simultaneously solve instance and cluster-level contrastive learning tasks. We obtain cluster representations online by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing
MethodsContrastive Learning
