COCOLA: Coherence-Oriented Contrastive Learning of Musical Audio   Representations

Ruben Ciranni; Giorgio Mariani; Michele Mancusi; Emilian Postolache,; Giorgio Fabbro; Emanuele Rodol\`a; Luca Cosmo

arXiv:2404.16969·cs.SD·January 10, 2025

COCOLA: Coherence-Oriented Contrastive Learning of Musical Audio Representations

Ruben Ciranni, Giorgio Mariani, Michele Mancusi, Emilian Postolache,, Giorgio Fabbro, Emanuele Rodol\`a, Luca Cosmo

PDF

Open Access 2 Repos

TL;DR

COCOLA introduces a contrastive learning approach for musical audio representations that emphasizes harmonic and rhythmic coherence, enabling better evaluation of music generation models.

Contribution

It proposes a novel coherence-oriented contrastive learning method operating on music stems, enhancing representation quality and benchmarking capabilities.

Findings

01

Effective in capturing harmonic and rhythmic coherence

02

Improves evaluation of music accompaniment models

03

Demonstrates superior performance on public datasets

Abstract

We present COCOLA (Coherence-Oriented Contrastive Learning for Audio), a contrastive learning method for musical audio representations that captures the harmonic and rhythmic coherence between samples. Our method operates at the level of the stems composing music tracks and can input features obtained via Harmonic-Percussive Separation (HPS). COCOLA allows the objective evaluation of generative models for music accompaniment generation, which are difficult to benchmark with established metrics. In this regard, we evaluate recent music accompaniment generation models, demonstrating the effectiveness of the proposed method. We release the model checkpoints trained on public datasets containing separate stems (MUSDB18-HQ, MoisesDB, Slakh2100, and CocoChorales).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies

MethodsContrastive Learning