Towards Practical Real-Time Low-Latency Music Source Separation
Junyu Wu, Jie Liu, Tianrui Pan, Jie Tang, Gangshan Wu

TL;DR
This paper presents RT-STT, a lightweight real-time music source separation model that achieves low latency and high performance, suitable for practical applications like live audio processing.
Contribution
The paper introduces RT-STT, a novel lightweight model with feature fusion and quantization techniques, advancing real-time music demixing with fewer parameters and faster inference.
Findings
RT-STT outperforms existing models in speed and parameter efficiency.
Single-path modeling is superior to dual-path in real-time scenarios.
Quantization further reduces inference time without sacrificing accuracy.
Abstract
In recent years, significant progress has been made in the field of deep learning for music demixing. However, there has been limited attention on real-time, low-latency music demixing, which holds potential for various applications, such as hearing aids, audio stream remixing, and live performances. Additionally, a notable tendency has emerged towards the development of larger models, limiting their applicability in certain scenarios. In this paper, we introduce a lightweight real-time low-latency model called Real-Time Single-Path TFC-TDF UNET (RT-STT), which is based on the Dual-Path TFC-TDF UNET (DTTNet). In RT-STT, we propose a feature fusion technique based on channel expansion. We also demonstrate the superiority of single-path modeling over dual-path modeling in real-time models. Moreover, we investigate the method of quantization to further reduce inference time. RT-STT…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Generative Adversarial Networks and Image Synthesis
