MIDI-Informed Singing Accompaniment Generation in a Compositional Song Pipeline

Fang-Duo Tsai; Yi-An Lai; Fei-Yueh Chen; Hsueh-Wei Fu; Wei-Jaw Lee; Hao-Chung Cheng; Yi-Hsuan Yang

arXiv:2602.22029·cs.SD·May 6, 2026

MIDI-Informed Singing Accompaniment Generation in a Compositional Song Pipeline

Fang-Duo Tsai, Yi-An Lai, Fei-Yueh Chen, Hsueh-Wei Fu, Wei-Jaw Lee, Hao-Chung Cheng, Yi-Hsuan Yang

PDF

1 Repo

TL;DR

This paper introduces MIDI-SAG, a structured, long-form singing accompaniment generation system that uses MIDI and chord information to improve coherence and control in professional music composition.

Contribution

MIDI-SAG is a novel framework that incorporates symbolic timing and structure planning for stable, long-form, score-to-song generation, trained efficiently with pre-trained modules.

Findings

01

MIDI-SAG can generate coherent long-form singing accompaniments.

02

The system effectively integrates MIDI and chord data for structured music synthesis.

03

Pre-trained modules enable data-efficient training on a single GPU.

Abstract

While end-to-end lyrics-to-song models offer convenience for casual users, professional songwriters require score-to-song systems that allow them to retain authorship over the core melody. However, existing score-to-song methods are limited to short-form snippets and fail to maintain coherence in long-form generation, particularly during vocal-silent sections like intros and bridges. To address this long-form bottleneck, we propose MIDI-informed singing accompaniment generation (MIDI-SAG). Unlike conventional audio-only models, MIDI-SAG utilizes symbolic timing and chord information derived from the vocal MIDI to provide a stable musical roadmap. By incorporating structure planning, which defines temporal boundaries and semantic labels, our framework facilitates consistent generation across both vocal and non-vocal sections. We demonstrate the feasibility of this compositional pipeline…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://composerflow.github.io/web_revealed
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.