Streaming Generation for Music Accompaniment
Yusong Wu, Mason Wang, Heidi Lei, Stephen Brade, Lancelot Blanchard, Shih-Lun Wu, Aaron Courville, Anna Huang

TL;DR
This paper introduces a real-time audio-to-audio music accompaniment model that balances latency, coherence, and throughput, addressing system delays and proposing advanced training objectives for live performance scenarios.
Contribution
It presents a novel model design considering system delays, explores the trade-offs between future visibility and chunk size, and highlights the need for anticipatory training objectives for coherent live accompaniment.
Findings
Increasing future visibility improves coherence but demands faster inference.
Larger output chunks increase throughput but reduce update frequency and quality.
Naive training methods are insufficient for real-time coherent accompaniment.
Abstract
Music generation models can produce high-fidelity coherent accompaniment given complete audio input, but are limited to editing and loop-based workflows. We study real-time audio-to-audio accompaniment: as a model hears an input audio stream (e.g., a singer singing), it has to also simultaneously generate in real-time a coherent accompanying stream (e.g., a guitar accompaniment). In this work, we propose a model design considering inevitable system delays in practical deployment with two design variables: future visibility , the offset between the output playback time and the latest input time used for conditioning, and output chunk duration , the number of frames emitted per call. We train Transformer decoders across a grid of and show two consistent trade-offs: increasing effective improves coherence by reducing the recency gap, but requires faster inference to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
