JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models
Peike Li, Boyu Chen, Yao Yao, Yikai Wang, Allen Wang, Alex Wang

TL;DR
JEN-1 is a versatile diffusion-based model that generates high-quality music from text prompts, capable of inpainting and continuation, outperforming existing methods in alignment and efficiency.
Contribution
The paper introduces JEN-1, a novel diffusion model that unifies multiple text-to-music tasks with high fidelity and efficiency, leveraging in-context learning.
Findings
JEN-1 outperforms state-of-the-art in text-music alignment.
JEN-1 achieves high-fidelity music generation with efficient computation.
JEN-1 demonstrates versatility in various music generation tasks.
Abstract
Music generation has attracted growing interest with the advancement of deep generative models. However, generating music conditioned on textual descriptions, known as text-to-music, remains challenging due to the complexity of musical structures and high sampling rate requirements. Despite the task's significance, prevailing generative models exhibit limitations in music quality, computational efficiency, and generalization. This paper introduces JEN-1, a universal high-fidelity model for text-to-music generation. JEN-1 is a diffusion model incorporating both autoregressive and non-autoregressive training. Through in-context learning, JEN-1 performs various generation tasks including text-guided music generation, music inpainting, and continuation. Evaluations demonstrate JEN-1's superior performance over state-of-the-art methods in text-music alignment and music quality while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies
MethodsDiffusion
