SongDriver: Real-time Music Accompaniment Generation without Logical Latency nor Exposure Bias
Zihao Wang, Qihao Liang, Kejun Zhang, Yuxing Wang, Chen Zhang, Pengfei, Yu, Yongsheng Feng, Wenbo Liu, Yikai Wang, Yuntai Bao, Yiheng Yang

TL;DR
SongDriver is a real-time music accompaniment system that eliminates logical latency and exposure bias by dividing the task into arrangement and prediction phases, using Transformer and CRF models, and incorporating global musical features.
Contribution
This paper introduces SongDriver, a novel two-phase system that achieves zero logical latency and avoids exposure bias in real-time music accompaniment generation.
Findings
Outperforms state-of-the-art models on objective metrics
Reduces physical latency significantly
Effectively incorporates long-term musical features
Abstract
Real-time music accompaniment generation has a wide range of applications in the music industry, such as music education and live performances. However, automatic real-time music accompaniment generation is still understudied and often faces a trade-off between logical latency and exposure bias. In this paper, we propose SongDriver, a real-time music accompaniment generation system without logical latency nor exposure bias. Specifically, SongDriver divides one accompaniment generation task into two phases: 1) The arrangement phase, where a Transformer model first arranges chords for input melodies in real-time, and caches the chords for the next phase instead of playing them out. 2) The prediction phase, where a CRF model generates playable multi-track accompaniments for the coming melodies based on previously cached chords. With this two-phase strategy, SongDriver directly generates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies
MethodsAttention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam · Softmax · Absolute Position Encodings · Dropout · Dense Connections · Residual Connection
