Generating Sample-Based Musical Instruments Using Neural Audio Codec Language Models
Shahan Nercessian, Johannes Imort, Ninon Devis, and Frederik Blang

TL;DR
This paper presents a neural audio codec language model approach for generating sample-based musical instruments conditioned on text or reference audio, addressing timbral consistency challenges and introducing new evaluation metrics.
Contribution
It introduces a novel conditioning scheme for neural audio models to generate musical instruments with maintained timbral consistency based on text or audio prompts.
Findings
The approach produces compelling musical instruments.
A new metric evaluates timbral consistency effectively.
Adapts CLAP score for text-to-instrument evaluation.
Abstract
In this paper, we propose and investigate the use of neural audio codec language models for the automatic generation of sample-based musical instruments based on text or reference audio prompts. Our approach extends a generative audio framework to condition on pitch across an 88-key spectrum, velocity, and a combined text/audio embedding. We identify maintaining timbral consistency within the generated instruments as a major challenge. To tackle this issue, we introduce three distinct conditioning schemes. We analyze our methods through objective metrics and human listening tests, demonstrating that our approach can produce compelling musical instruments. Specifically, we introduce a new objective metric to evaluate the timbral consistency of the generated instruments and adapt the average Contrastive Language-Audio Pretraining (CLAP) score for the text-to-instrument case, noting that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing
