Contrastive Learning from Synthetic Audio Doppelg\"angers

Manuel Cherep; Nikhil Singh

arXiv:2406.05923·cs.SD·March 4, 2025

Contrastive Learning from Synthetic Audio Doppelg\"angers

Manuel Cherep, Nikhil Singh

PDF

Open Access

TL;DR

This paper introduces a contrastive learning method using synthetic audio generated by perturbing sound synthesizer parameters, which improves audio representation quality and reduces data requirements.

Contribution

It presents a novel approach of using synthetic audio pairs for contrastive learning, outperforming real data methods and requiring minimal hyperparameters.

Findings

01

Synthetic audio pairs enhance contrastive learning effectiveness.

02

Method outperforms real data-based approaches on standard tasks.

03

Approach is lightweight with no data storage needs.

Abstract

Learning robust audio representations currently demands extensive datasets of real-world sound recordings. By applying artificial transformations to these recordings, models can learn to recognize similarities despite subtle variations through techniques like contrastive learning. However, these transformations are only approximations of the true diversity found in real-world sounds, which are generated by complex interactions of physical processes, from vocal cord vibrations to the resonance of musical instruments. We propose a solution to both the data scale and transformation limitations, leveraging synthetic audio. By randomly perturbing the parameters of a sound synthesizer, we generate audio doppelg\"angers-synthetic positive pairs with causally manipulated variations in timbre, pitch, and temporal envelopes. These variations, difficult to achieve through augmentations of existing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Advanced Adaptive Filtering Techniques

MethodsContrastive Learning