Pre-training with Synthetic Patterns for Audio

Yuchi Ishikawa; Tatsuya Komatsu; Yoshimitsu Aoki

arXiv:2410.00511·eess.AS·October 2, 2024

Pre-training with Synthetic Patterns for Audio

Yuchi Ishikawa, Tatsuya Komatsu, Yoshimitsu Aoki

PDF

Open Access

TL;DR

This paper introduces a novel pre-training method for audio encoders using synthetic patterns and Masked Autoencoders, enabling effective learning without real audio data and addressing privacy concerns.

Contribution

It presents a new framework combining MAEs with synthetic data for pre-training audio models, avoiding reliance on real audio datasets.

Findings

01

Achieves performance comparable to models trained on large real audio datasets

02

Partially outperforms image-based pre-training methods

03

Effective across 13 audio tasks and 17 synthetic datasets

Abstract

In this paper, we propose to pre-train audio encoders using synthetic patterns instead of real audio data. Our proposed framework consists of two key elements. The first one is Masked Autoencoder (MAE), a self-supervised learning framework that learns from reconstructing data from randomly masked counterparts. MAEs tend to focus on low-level information such as visual patterns and regularities within data. Therefore, it is unimportant what is portrayed in the input, whether it be images, audio mel-spectrograms, or even synthetic patterns. This leads to the second key element, which is synthetic data. Synthetic data, unlike real audio, is free from privacy and licensing infringement issues. By combining MAEs and synthetic patterns, our framework enables the model to learn generalized feature representations without real data, while addressing the issues related to real audio. To evaluate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies

MethodsFocus