An Independence-promoting Loss for Music Generation with Language Models

Jean-Marie Lemercier; Simon Rouard; Jade Copet; Yossi Adi and; Alexandre D\'efossez

arXiv:2406.02315·cs.SD·June 11, 2024

An Independence-promoting Loss for Music Generation with Language Models

Jean-Marie Lemercier, Simon Rouard, Jade Copet, Yossi Adi and, Alexandre D\'efossez

PDF

Open Access

TL;DR

This paper introduces an independence-promoting loss for music generation with language models, reducing dependence between codebooks to improve quality and speed, especially when modeling marginal distributions.

Contribution

It proposes a novel mutual information proxy loss based on maximum mean discrepancy to regularize auto-encoders in multi-codebook music tokenization.

Findings

01

Reduces statistical dependence between codebooks during auto-encoding.

02

Improves music generation quality when modeling marginal distributions.

03

Enables faster audio generation compared to joint distribution modeling.

Abstract

Music generation schemes using language modeling rely on a vocabulary of audio tokens, generally provided as codes in a discrete latent space learnt by an auto-encoder. Multi-stage quantizers are often employed to produce these tokens, therefore the decoding strategy used for token prediction must be adapted to account for multiple codebooks: either it should model the joint distribution over all codebooks, or fit the product of the codebook marginal distributions. Modelling the joint distribution requires a costly increase in the number of auto-regressive steps, while fitting the product of the marginals yields an inexact model unless the codebooks are mutually independent. In this work, we introduce an independence-promoting loss to regularize the auto-encoder used as the tokenizer in language models for music generation. The proposed loss is a proxy for mutual information based on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing