Spectral Mapping of Singing Voices: U-Net-Assisted Vocal Segmentation

Adam Sorrenti

arXiv:2405.20059·cs.SD·May 31, 2024

Spectral Mapping of Singing Voices: U-Net-Assisted Vocal Segmentation

Adam Sorrenti

PDF

Open Access 1 Repo

TL;DR

This paper presents a U-Net-based neural network approach for accurately separating singing voices from musical tracks using spectrogram analysis, achieving high SDR, SIR, and SAR scores on the MUSDB18 dataset.

Contribution

It introduces a novel application of U-Net with frequency normalization and MAE loss for vocal separation, outperforming previous methods.

Findings

01

Achieved SDR of 7.1 dB indicating high separation quality.

02

Recorded SIR of 25.2 dB and SAR of 7.2 dB, surpassing other configurations.

03

Demonstrated the effectiveness of frequency normalization and MAE loss in vocal segmentation.

Abstract

Separating vocal elements from musical tracks is a longstanding challenge in audio signal processing. This study tackles the distinct separation of vocal components from musical spectrograms. We employ the Short Time Fourier Transform (STFT) to extract audio waves into detailed frequency-time spectrograms, utilizing the benchmark MUSDB18 dataset for music separation. Subsequently, we implement a UNet neural network to segment the spectrogram image, aiming to delineate and extract singing voice components accurately. We achieved noteworthy results in audio source separation using of our U-Net-based models. The combination of frequency-axis normalization with Min/Max scaling and the Mean Absolute Error (MAE) loss function achieved the highest Source-to-Distortion Ratio (SDR) of 7.1 dB, indicating a high level of accuracy in preserving the quality of the original signal during separation.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mbrotos/soundseg
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing