Multi-Stage Music Source Restoration with BandSplit-RoFormer Separation and HiFi++ GAN
Tobias Morocutti, Emmanouil Karystinaios, Jonathan Greif, Gerhard Widmer

TL;DR
This paper introduces a multi-stage approach for music source restoration that combines a novel separation model, BandSplit-RoFormer, with a waveform restoration GAN, HiFi++, to recover instrument stems from complex mixed audio.
Contribution
It presents a new multi-stage system integrating a transformer-based separator and a GAN-based waveform restorer for improved music source restoration.
Findings
Effective separation of eight stems plus an auxiliary stem.
Successful training with a curriculum from 4-stem to 8-stem separation.
High-quality waveform restoration with specialized GANs.
Abstract
Music Source Restoration (MSR) targets recovery of original, unprocessed instrument stems from fully mixed and mastered audio, where production effects and distribution artifacts violate common linear-mixture assumptions. This technical report presents the CP-JKU team's system for the MSR ICASSP Challenge 2025. Our approach decomposes MSR into separation and restoration. First, a single BandSplit-RoFormer separator predicts eight stems plus an auxiliary other stem, and is trained with a three-stage curriculum that progresses from 4-stem warm-start fine-tuning (with LoRA) to 8-stem extension via head expansion. Second, we apply a HiFi++ GAN waveform restorer trained as a generalist and then specialized into eight instrument-specific experts.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Music Technology and Sound Studies
