SongEcho: Towards Cover Song Generation via Instance-Adaptive Element-wise Linear Modulation

Sifei Li; Yang Li; Zizhou Wang; Yuxin Zhang; Fuzhang Wu; Oliver Deussen; Tong-Yee Lee; Weiming Dong

arXiv:2602.19976·cs.SD·February 24, 2026

SongEcho: Towards Cover Song Generation via Instance-Adaptive Element-wise Linear Modulation

Sifei Li, Yang Li, Zizhou Wang, Yuxin Zhang, Fuzhang Wu, Oliver Deussen, Tong-Yee Lee, Weiming Dong

PDF

Open Access 1 Models 1 Datasets 3 Reviews

TL;DR

SongEcho introduces a novel conditional generative model for cover song creation that reinterprets original melodies with new vocals and accompaniment, utilizing advanced modulation techniques and a new dataset.

Contribution

The paper presents SongEcho, a new framework employing Instance-Adaptive Element-wise Linear Modulation for controllable cover song generation, along with a large-scale dataset Suno70k.

Findings

01

Outperforms existing methods in cover song quality

02

Requires fewer than 30% of trainable parameters

03

Demonstrates effective controllable generation

Abstract

Cover songs constitute a vital aspect of musical culture, preserving the core melody of an original composition while reinterpreting it to infuse novel emotional depth and thematic emphasis. Although prior research has explored the reinterpretation of instrumental music through melody-conditioned text-to-music models, the task of cover song generation remains largely unaddressed. In this work, we reformulate our cover song generation as a conditional generation, which simultaneously generates new vocals and accompaniment conditioned on the original vocal melody and text prompts. To this end, we present SongEcho, which leverages Instance-Adaptive Element-wise Linear Modulation (IA-EiLM), a framework that incorporates controllable generation by improving both conditioning injection mechanism and conditional representation. To enhance the conditioning injection mechanism, we extend…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 8Confidence 4

Strengths

1. The paper formalizes an intersting task—melody-controlled cover song generation—and carefully discusses the limitations of existing condition-injection mechanisms in NAR frameworks. The proposed IA-EiLM achieves superior performance in parameter efficiency, precise temporal control, and melody adherence. 2. The experiments and ablation studies are comprehensive and well-designed. 3. The Suno-70k dataset represents a meaningful contribution to the research community. 4. The presentation is cle

Weaknesses

1. The paper lacks details on the exact form of melody input. If it only uses pitch sequences, how is the alignment between notes and lyrics ensured? Why was this particular representation chosen, and were other melody representations considered? 2. How does the model handle conflicts between text tags and melody? Since a melody implicitly encodes stylistic attributes, it would be useful to clarify how such inconsistencies are resolved. 3. Although SongEval Aesthetics Metrics are included in t

Reviewer 02Rating 6Confidence 4

Strengths

This paper proposes Instance-Adaptive Element-wise Linear Modulation (IA-EiLM), which comprises the EiLM and Instance-Adaptive Condition Refinement (IACR), enhancing the condition injection mechanism and conditional representation, respectively. This paper introduces Suno70k, an open-source AI song dataset enriched with detailed annotations, including enhanced tags and lyrics.

Weaknesses

One of the paper's claimed innovations, EiLM, appears to be a relatively trivial extension of FiLM. This leaves IACR as the paper's primary technical insight, which may render the overall technical novelty somewhat limited.

Reviewer 03Rating 2Confidence 4

Strengths

- adaptation of a pretrained linear DiT model to a new control, establishing a new approach for melody conditioning of music generation - two rebranded approaches EiLM and ICAR for conditioning - a new large dataset of full music, including singing voice and lyrics annotations. - successful evaluation of the proposed methods.

Weaknesses

The style of presentation is rather unclear and confusing: - in the abstract the authors introduce the term cover song generation as: (Line 17) *We formalize this challenge as Cover Song Generation, which requires preserving the source vocal melody while simultaneously synthesizing new vocals and accompaniment, posing higher demands for controllable music generation.* they later change into: (line 40) *...reinterpret the original’s emotional and stylistic core, evolving a gentle country balla

Code & Models

Models

🤗
lsfhuihuiff/SongEcho
model

Datasets

lsfhuihuiff/suno70k
dataset· 39 dl
39 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Generative Adversarial Networks and Image Synthesis