Multi-scale temporal-frequency attention for music source separation

Lianwu Chen; Xiguang Zheng; Chen Zhang; Liang Guo; Bing Yu

arXiv:2209.00805·eess.AS·September 5, 2022·ICME

Multi-scale temporal-frequency attention for music source separation

Lianwu Chen, Xiguang Zheng, Chen Zhang, Liang Guo, Bing Yu

PDF

Open Access

TL;DR

This paper introduces a multi-scale temporal-frequency attention module for music source separation, explicitly modeling spectrogram correlations to improve separation quality, achieving state-of-the-art results on MUSDB18.

Contribution

It proposes a novel attention mechanism that captures multi-scale temporal and frequency correlations in spectrograms for music source separation.

Findings

01

Outperforms existing methods with 9.51 dB SDR on vocal separation

02

Effectively models spectrogram correlations across multiple scales

03

Achieves state-of-the-art performance on MUSDB18 dataset

Abstract

In recent years, deep neural networks (DNNs) based approaches have achieved the start-of-the-art performance for music source separation (MSS). Although previous methods have addressed the large receptive field modeling using various methods, the temporal and frequency correlations of the music spectrogram with repeated patterns have not been explicitly explored for the MSS task. In this paper, a temporal-frequency attention module is proposed to model the spectrogram correlations along both temporal and frequency dimensions. Moreover, a multi-scale attention is proposed to effectively capture the correlations for music signal. The experimental results on MUSDB18 dataset show that the proposed method outperforms the existing state-of-the-art systems with 9.51 dB signal-to-distortion ratio (SDR) on separating the vocal stems, which is the primary practical application of MSS.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Advanced Adaptive Filtering Techniques