TL;DR
This paper introduces an unsupervised contrastive learning approach for sound event representations, leveraging data augmentation and mixing techniques to improve recognition performance with limited or noisy labeled data.
Contribution
It proposes a novel contrastive learning framework for sound events that enhances robustness and reduces reliance on labeled data, outperforming supervised methods.
Findings
Unsupervised contrastive pre-training improves sound event classification accuracy.
The method increases robustness against noisy labels.
It mitigates data scarcity issues in sound recognition tasks.
Abstract
Self-supervised representation learning can mitigate the limitations in recognition tasks with few manually labeled data but abundant unlabeled data---a common scenario in sound event research. In this work, we explore unsupervised contrastive learning as a way to learn sound event representations. To this end, we propose to use the pretext task of contrasting differently augmented views of sound events. The views are computed primarily via mixing of training examples with unrelated backgrounds, followed by other data augmentations. We analyze the main components of our method via ablation experiments. We evaluate the learned representations using linear evaluation, and in two in-domain downstream sound event classification tasks, namely, using limited manually labeled data, and using noisy labeled data. Our results suggest that unsupervised contrastive pre-training can mitigate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsContrastive Learning
