Masked Latent Prediction and Classification for Self-Supervised Audio Representation Learning

Aurian Quelennec; Pierre Chouteau; Geoffroy Peeters; Slim Essid

arXiv:2502.12031·cs.SD·June 5, 2025

Masked Latent Prediction and Classification for Self-Supervised Audio Representation Learning

Aurian Quelennec, Pierre Chouteau, Geoffroy Peeters, Slim Essid

PDF

1 Repo

TL;DR

The paper introduces MATPAC, a self-supervised audio representation learning method that combines masked latent prediction with unsupervised classification, achieving state-of-the-art results on multiple audio datasets.

Contribution

It proposes a novel joint training approach with two pretext tasks, enhancing latent space representations for better downstream classification performance.

Findings

01

MATPAC outperforms existing self-supervised methods on several datasets.

02

It surpasses supervised methods in musical auto-tagging.

03

Ablation studies confirm the effectiveness of the joint pretext tasks.

Abstract

Recently, self-supervised learning methods based on masked latent prediction have proven to encode input data into powerful representations. However, during training, the learned latent space can be further transformed to extract higher-level information that could be more suited for downstream classification tasks. Therefore, we propose a new method: MAsked latenT Prediction And Classification (MATPAC), which is trained with two pretext tasks solved jointly. As in previous work, the first pretext task is a masked latent prediction task, ensuring a robust input representation in the latent space. The second one is unsupervised classification, which utilises the latent representations of the first pretext task to match probability distributions between a teacher and a student. We validate the MATPAC method by comparing it to other state-of-the-art proposals and conducting ablations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aurianworld/matpac
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.