SongEcho: Towards Cover Song Generation via Instance-Adaptive Element-wise Linear Modulation
Sifei Li, Yang Li, Zizhou Wang, Yuxin Zhang, Fuzhang Wu, Oliver Deussen, Tong-Yee Lee, Weiming Dong

TL;DR
SongEcho introduces a novel conditional generative model for cover song creation that reinterprets original melodies with new vocals and accompaniment, utilizing advanced modulation techniques and a new dataset.
Contribution
The paper presents SongEcho, a new framework employing Instance-Adaptive Element-wise Linear Modulation for controllable cover song generation, along with a large-scale dataset Suno70k.
Findings
Outperforms existing methods in cover song quality
Requires fewer than 30% of trainable parameters
Demonstrates effective controllable generation
Abstract
Cover songs constitute a vital aspect of musical culture, preserving the core melody of an original composition while reinterpreting it to infuse novel emotional depth and thematic emphasis. Although prior research has explored the reinterpretation of instrumental music through melody-conditioned text-to-music models, the task of cover song generation remains largely unaddressed. In this work, we reformulate our cover song generation as a conditional generation, which simultaneously generates new vocals and accompaniment conditioned on the original vocal melody and text prompts. To this end, we present SongEcho, which leverages Instance-Adaptive Element-wise Linear Modulation (IA-EiLM), a framework that incorporates controllable generation by improving both conditioning injection mechanism and conditional representation. To enhance the conditioning injection mechanism, we extend…
Peer Reviews
Decision·ICLR 2026 Poster
1. The paper formalizes an intersting task—melody-controlled cover song generation—and carefully discusses the limitations of existing condition-injection mechanisms in NAR frameworks. The proposed IA-EiLM achieves superior performance in parameter efficiency, precise temporal control, and melody adherence. 2. The experiments and ablation studies are comprehensive and well-designed. 3. The Suno-70k dataset represents a meaningful contribution to the research community. 4. The presentation is cle
1. The paper lacks details on the exact form of melody input. If it only uses pitch sequences, how is the alignment between notes and lyrics ensured? Why was this particular representation chosen, and were other melody representations considered? 2. How does the model handle conflicts between text tags and melody? Since a melody implicitly encodes stylistic attributes, it would be useful to clarify how such inconsistencies are resolved. 3. Although SongEval Aesthetics Metrics are included in t
This paper proposes Instance-Adaptive Element-wise Linear Modulation (IA-EiLM), which comprises the EiLM and Instance-Adaptive Condition Refinement (IACR), enhancing the condition injection mechanism and conditional representation, respectively. This paper introduces Suno70k, an open-source AI song dataset enriched with detailed annotations, including enhanced tags and lyrics.
One of the paper's claimed innovations, EiLM, appears to be a relatively trivial extension of FiLM. This leaves IACR as the paper's primary technical insight, which may render the overall technical novelty somewhat limited.
- adaptation of a pretrained linear DiT model to a new control, establishing a new approach for melody conditioning of music generation - two rebranded approaches EiLM and ICAR for conditioning - a new large dataset of full music, including singing voice and lyrics annotations. - successful evaluation of the proposed methods.
The style of presentation is rather unclear and confusing: - in the abstract the authors introduce the term cover song generation as: (Line 17) *We formalize this challenge as Cover Song Generation, which requires preserving the source vocal melody while simultaneously synthesizing new vocals and accompaniment, posing higher demands for controllable music generation.* they later change into: (line 40) *...reinterpret the original’s emotional and stylistic core, evolving a gentle country balla
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Generative Adversarial Networks and Image Synthesis
