Exploring compressibility of transformer based text-to-music (TTM)   models

Vasileios Moschopoulos; Thanasis Kotsiopoulos; Pablo Peso Parada,; Konstantinos Nikiforidis; Alexandros Stergiadis; Gerasimos Papakostas; Md; Asif Jalal; Jisi Zhang; Anastasios Drosou; Karthikeyan Saravanan

arXiv:2406.17159·eess.AS·June 26, 2024

Exploring compressibility of transformer based text-to-music (TTM) models

Vasileios Moschopoulos, Thanasis Kotsiopoulos, Pablo Peso Parada,, Konstantinos Nikiforidis, Alexandros Stergiadis, Gerasimos Papakostas, Md, Asif Jalal, Jisi Zhang, Anastasios Drosou, Karthikeyan Saravanan

PDF

Open Access

TL;DR

This paper analyzes how to compress large text-to-music models using knowledge distillation and modifications, creating a smaller model that maintains competitive music generation quality.

Contribution

It introduces TinyTTM, a significantly compressed TTM model, and explores trade-offs between model size and performance.

Findings

01

TinyTTM achieves better FAD and KL scores than larger models.

02

Compression methods enable deployment on resource-constrained devices.

03

Trade-offs between size and quality are characterized.

Abstract

State-of-the art Text-To-Music (TTM) generative AI models are large and require desktop or server class compute, making them infeasible for deployment on mobile phones. This paper presents an analysis of trade-offs between model compression and generation performance of TTM models. We study compression through knowledge distillation and specific modifications that enable applicability over the various components of the TTM model (encoder, generative model and the decoder). Leveraging these methods we create TinyTTM (89.2M params) that achieves a FAD of 3.66 and KL of 1.32 on MusicBench dataset, better than MusicGen-Small (557.6M params) but not lower than MusicGen-small fine-tuned on MusicBench.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies

MethodsKnowledge Distillation