DTT-BSR: GAN-based DTTNet with RoPE Transformer Enhancement for Music Source Restoration
Shihong Tan, Haoyu Wang, Youran Ni, Yingzhao Hou, Jiayue Luo, Zipei Hu, Han Dou, Zerui Han, Ningning Pan, Yuzhu Wang, Gongping Huang

TL;DR
This paper introduces DTT-BSR, a GAN-based model with RoPE transformer and dual-path RNN for music source restoration, achieving top leaderboard rankings with high fidelity and semantic accuracy in a compact form.
Contribution
It presents a novel hybrid GAN architecture combining RoPE transformer and dual-path RNN for improved music source restoration.
Findings
Achieved 3rd place on ICASSP 2026 MSR Challenge objective leaderboard.
Achieved 4th place on subjective leaderboard.
Model has a compact size of 7.1 million parameters.
Abstract
Music source restoration (MSR) aims to recover unprocessed stems from mixed and mastered recordings. The challenge lies in both separating overlapping sources and reconstructing signals degraded by production effects such as compression and reverberation. We therefore propose DTT-BSR, a hybrid generative adversarial network (GAN) combining rotary positional embeddings (RoPE) transformer for long-term temporal modeling with dual-path band-split recurrent neural network (RNN) for multi-resolution spectral processing. Our model achieved 3rd place on the objective leaderboard and 4th place on the subjective leaderboard on the ICASSP 2026 MSR Challenge, demonstrating exceptional generation fidelity and semantic alignment with a compact size of 7.1M parameters.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Hearing Loss and Rehabilitation
