Exploring Self-Supervised Contrastive Learning of Spatial Sound Event Representation
Xilin Jiang, Cong Han, Yinghao Aaron Li, Nima Mesgarani

TL;DR
This paper introduces MC-SimCLR, a contrastive learning framework for spatial audio that improves event classification and localization by learning joint spectral and spatial representations through multi-level data augmentation.
Contribution
It proposes a novel multi-channel contrastive learning method with a multi-level augmentation pipeline for spatial audio representation learning.
Findings
Linear layers outperform supervised models in classification and localization.
Augmentation methods significantly impact representation quality.
Fine-tuning with less labeled data remains effective.
Abstract
In this study, we present a simple multi-channel framework for contrastive learning (MC-SimCLR) to encode 'what' and 'where' of spatial audios. MC-SimCLR learns joint spectral and spatial representations from unlabeled spatial audios, thereby enhancing both event classification and sound localization in downstream tasks. At its core, we propose a multi-level data augmentation pipeline that augments different levels of audio features, including waveforms, Mel spectrograms, and generalized cross-correlation (GCC) features. In addition, we introduce simple yet effective channel-wise augmentation methods to randomly swap the order of the microphones and mask Mel and GCC channels. By using these augmentations, we find that linear layers on top of the learned representation significantly outperform supervised models in terms of both event classification accuracy and localization error. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Acoustic Wave Phenomena Research
MethodsContrastive Learning
