S-KEY: Self-supervised Learning of Major and Minor Keys from Audio

Yuexuan Kong; Gabriel Meseguer-Brocal; Vincent Lostanlen; Mathieu Lagrange; Romain Hennequin

arXiv:2501.12907·cs.SD·June 24, 2025

S-KEY: Self-supervised Learning of Major and Minor Keys from Audio

Yuexuan Kong, Gabriel Meseguer-Brocal, Vincent Lostanlen, Mathieu Lagrange, Romain Hennequin

PDF

Open Access 1 Repo

TL;DR

This paper introduces S-KEY, a self-supervised neural network model that accurately identifies major and minor keys in music without human labels, leveraging transposition-invariant features and large-scale training.

Contribution

It extends the STONE architecture with an auxiliary task using pseudo-labels, enabling large-scale self-supervised learning of tonality in music.

Findings

01

Matches supervised state-of-the-art accuracy on FMAKv2 and GTZAN datasets.

02

Requires no human annotation and maintains the same parameter budget as STONE.

03

Successfully trained on a dataset of one million songs, demonstrating scalability.

Abstract

STONE, the current method in self-supervised learning for tonality estimation in music signals, cannot distinguish relative keys, such as C major versus A minor. In this article, we extend the neural network architecture and learning objective of STONE to perform self-supervised learning of major and minor keys (S-KEY). Our main contribution is an auxiliary pretext task to STONE, formulated using transposition-invariant chroma features as a source of pseudo-labels. S-KEY matches the supervised state of the art in tonality estimation on FMAKv2 and GTZAN datasets while requiring no human annotation and having the same parameter budget as STONE. We build upon this result and expand the training set of S-KEY to a million songs, thus showing the potential of large-scale self-supervised learning in music information retrieval.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

deezer/s-key
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Diverse Musicological Studies · Speech and Audio Processing