MBTFNet: Multi-Band Temporal-Frequency Neural Network For Singing Voice   Enhancement

Weiming Xu; Zhouxuan Chen; Zhili Tan; Shubo Lv; Runduo Han; Wenjiang; Zhou; Weifeng Zhao; Lei Xie

arXiv:2310.04369·cs.SD·October 9, 2023

MBTFNet: Multi-Band Temporal-Frequency Neural Network For Singing Voice Enhancement

Weiming Xu, Zhouxuan Chen, Zhili Tan, Shubo Lv, Runduo Han, Wenjiang, Zhou, Weifeng Zhao, Lei Xie

PDF

Open Access

TL;DR

MBTFNet is a novel neural network designed specifically for singing voice enhancement, effectively removing background music, noise, and backing vocals by combining multi-band processing and personalized enhancement techniques.

Contribution

The paper introduces MBTFNet, a multi-band temporal-frequency neural network with dual-path modeling and an implicit personalized enhancement stage for improved singing voice separation.

Findings

01

Outperforms state-of-the-art speech enhancement models

02

Effectively removes background music and noise

03

Enhances singing voice clarity

Abstract

A typical neural speech enhancement (SE) approach mainly handles speech and noise mixtures, which is not optimal for singing voice enhancement scenarios. Music source separation (MSS) models treat vocals and various accompaniment components equally, which may reduce performance compared to the model that only considers vocal enhancement. In this paper, we propose a novel multi-band temporal-frequency neural network (MBTFNet) for singing voice enhancement, which particularly removes background music, noise and even backing vocals from singing recordings. MBTFNet combines inter and intra-band modeling for better processing of full-band signals. Dual-path modeling are introduced to expand the receptive field of the model. We propose an implicit personalized enhancement (IPE) stage based on signal-to-noise ratio (SNR) estimation, which further improves the performance of MBTFNet.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing