Towards Practical Real-Time Low-Latency Music Source Separation

Junyu Wu; Jie Liu; Tianrui Pan; Jie Tang; Gangshan Wu

arXiv:2511.13146·cs.SD·November 18, 2025

Towards Practical Real-Time Low-Latency Music Source Separation

Junyu Wu, Jie Liu, Tianrui Pan, Jie Tang, Gangshan Wu

PDF

Open Access

TL;DR

This paper presents RT-STT, a lightweight real-time music source separation model that achieves low latency and high performance, suitable for practical applications like live audio processing.

Contribution

The paper introduces RT-STT, a novel lightweight model with feature fusion and quantization techniques, advancing real-time music demixing with fewer parameters and faster inference.

Findings

01

RT-STT outperforms existing models in speed and parameter efficiency.

02

Single-path modeling is superior to dual-path in real-time scenarios.

03

Quantization further reduces inference time without sacrificing accuracy.

Abstract

In recent years, significant progress has been made in the field of deep learning for music demixing. However, there has been limited attention on real-time, low-latency music demixing, which holds potential for various applications, such as hearing aids, audio stream remixing, and live performances. Additionally, a notable tendency has emerged towards the development of larger models, limiting their applicability in certain scenarios. In this paper, we introduce a lightweight real-time low-latency model called Real-Time Single-Path TFC-TDF UNET (RT-STT), which is based on the Dual-Path TFC-TDF UNET (DTTNet). In RT-STT, we propose a feature fusion technique based on channel expansion. We also demonstrate the superiority of single-path modeling over dual-path modeling in real-time models. Moreover, we investigate the method of quantization to further reduce inference time. RT-STT…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Generative Adversarial Networks and Image Synthesis