Spectrograms Are Sequences of Patches

Leyi Zhao; Yi Li

arXiv:2210.15988·cs.SD·October 31, 2022·1 cites

Spectrograms Are Sequences of Patches

Leyi Zhao, Yi Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces a self-supervised learning approach treating music spectrograms as sequences of patches, leveraging NLP and CV techniques to improve audio representations without labeled data.

Contribution

The work proposes a novel patch-based self-supervised model for music spectrograms, demonstrating the effectiveness of sequential patch modeling in audio tasks.

Findings

01

Model achieves competitive results on downstream tasks.

02

Treating spectrograms as patch sequences is effective.

03

Self-supervised learning reduces reliance on labeled data.

Abstract

Self-supervised pre-training models have been used successfully in several machine learning domains. However, only a tiny amount of work is related to music. In our work, we treat a spectrogram of music as a series of patches and design a self-supervised model that captures the features of these sequential patches: Patchifier, which makes good use of self-supervised learning methods from both NLP and CV domains. We do not use labeled data for the pre-training process, only a subset of the MTAT dataset containing 16k music clips. After pre-training, we apply the model to several downstream tasks. Our model achieves a considerably acceptable result compared to other audio representation models. Meanwhile, our work demonstrates that it makes sense to consider audio as a series of patch segments.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

annihi1ation/patchifier-neo
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis