An End-to-End Approach for Chord-Conditioned Song Generation
Shuochen Gao, Shun Lei, Fan Zhuo, Hangyu Liu, Feng Liu, Boshi Tang,, Qiaochu Huang, Shiyin Kang, Zhiyong Wu

TL;DR
This paper introduces a chord-conditioned song generation model that improves musical quality and control by integrating chord information with a robust cross-attention mechanism, advancing automatic music synthesis.
Contribution
The paper presents a novel Chord-Conditioned Song Generator (CSG) that effectively incorporates chord data into song synthesis, addressing limitations of previous methods like Jukebox.
Findings
Outperforms existing methods in musical quality.
Enhances control over generated music.
Reduces frame-level flaws in synthesis.
Abstract
The Song Generation task aims to synthesize music composed of vocals and accompaniment from given lyrics. While the existing method, Jukebox, has explored this task, its constrained control over the generations often leads to deficiency in music performance. To mitigate the issue, we introduce an important concept from music composition, namely chords, to song generation networks. Chords form the foundation of accompaniment and provide vocal melody with associated harmony. Given the inaccuracy of automatic chord extractors, we devise a robust cross-attention mechanism augmented with dynamic weight sequence to integrate extracted chord information into song generations and reduce frame-level flaws, and propose a novel model termed Chord-Conditioned Song Generator (CSG) based on it. Experimental evidence demonstrates our proposed method outperforms other approaches in terms of musical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Cellular Automata and Applications
MethodsResidual Connection · Dense Connections · Convolution · Dilated Convolution · VQ-VAE · Layer Normalization · Position-Wise Feed-Forward Layer · Jukebox
