Music ControlNet: Multiple Time-varying Controls for Music Generation
Shih-Lun Wu, Chris Donahue, Shinji Watanabe, Nicholas J., Bryan

TL;DR
Music ControlNet introduces a diffusion-based model enabling precise, time-varying control over generated music, surpassing text-based control limitations and allowing detailed manipulation of musical attributes over time.
Contribution
The paper presents a novel diffusion-based music generation model that incorporates multiple time-varying controls, including partial and audio-extracted controls, enhancing precision and flexibility.
Findings
Achieves 49% higher fidelity to input melodies compared to MusicGen.
Supports multiple time-varying controls, including partial controls.
Generates realistic music with fewer parameters and less training data.
Abstract
Text-to-music generation models are now capable of generating high-quality music audio in broad styles. However, text control is primarily suitable for the manipulation of global musical attributes like genre, mood, and tempo, and is less suitable for precise control over time-varying attributes such as the positions of beats in time or the changing dynamics of the music. We propose Music ControlNet, a diffusion-based music generation model that offers multiple precise, time-varying controls over generated audio. To imbue text-to-music models with time-varying control, we propose an approach analogous to pixel-wise control of the image-domain ControlNet method. Specifically, we extract controls from training audio yielding paired data, and fine-tune a diffusion-based conditional generative model over audio spectrograms given melody, dynamics, and rhythm controls. While the image-domain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Neuroscience and Music Perception
