Masked Spectrogram Prediction For Self-Supervised Audio Pre-Training
Dading Chong, Helin Wang, Peilin Zhou, Qingcheng Zeng

TL;DR
This paper introduces MaskSpec, a self-supervised learning method that masks and reconstructs spectrogram patches to learn effective audio representations, outperforming previous models on multiple audio classification benchmarks.
Contribution
The paper proposes a novel masked spectrogram prediction approach for self-supervised audio pre-training, improving performance without extra supervision or model weights.
Findings
Achieves state-of-the-art results on AudioSet with 0.471 mAP
Outperforms previous pre-trained models on multiple datasets
Demonstrates effectiveness of masked spectrogram reconstruction for audio tasks
Abstract
Transformer-based models attain excellent results and generalize well when trained on sufficient amounts of data. However, constrained by the limited data available in the audio domain, most transformer-based models for audio tasks are finetuned from pre-trained models in other domains (e.g. image), which has a notable gap with the audio domain. Other methods explore the self-supervised learning approaches directly in the audio domain but currently do not perform well in the downstream tasks. In this paper, we present a novel self-supervised learning method for transformer-based audio models, called masked spectrogram prediction (MaskSpec), to learn powerful audio representations from unlabeled audio data (AudioSet used in this paper). Our method masks random patches of the input spectrogram and reconstructs the masked regions with an encoder-decoder architecture. Without using extra…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Structural Health Monitoring Techniques
