Self-Supervised Convolutional Audio Models are Flexible Acoustic Feature   Learners: A Domain Specificity and Transfer-Learning Study

Mattson Ogg

arXiv:2502.02366·eess.AS·February 5, 2025

Self-Supervised Convolutional Audio Models are Flexible Acoustic Feature Learners: A Domain Specificity and Transfer-Learning Study

Mattson Ogg

PDF

Open Access 1 Repo

TL;DR

Self-supervised convolutional audio models trained on diverse data can effectively learn flexible acoustic features, performing well across various speech and non-speech tasks with minimal domain-specific tuning.

Contribution

This study demonstrates that SSL models pre-trained on different audio domains exhibit broad transferability, often matching or surpassing domain-specific models in downstream tasks.

Findings

01

Pre-trained models perform well across multiple tasks.

02

Minimal domain-specificity advantages observed.

03

SSL models outperform some domain-specific baselines.

Abstract

Self-supervised learning (SSL) algorithms have emerged as powerful tools that can leverage large quantities of unlabeled audio data to pre-train robust representations that support strong performance on diverse downstream tasks. Up to now these have mostly been developed separately for speech and non-speech applications. Here, we explored the domain specificity of a convolutional model's pre-training data relative to different downstream speech and non-speech tasks using a self-supervised pre-training approach (BYOL-A). We found that these pre-trained models (regardless of whether they were pre-trained on speech data, non-speech data or both) enabled good performance on nearly all downstream tasks, beating or nearly matching the performance of popular domain-specific models. Only small domain-specificity advantages were observed between the different pre-training datasets. The popular…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mogg64/byola_domainxfer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing