Singing Voice Separation and Vocal F0 Estimation based on Mutual Combination of Robust Principal Component Analysis and Subharmonic Summation
Yukara Ikemiya, Katsutoshi Itoyama, Kazuyoshi Yoshii

TL;DR
This paper introduces a novel method that jointly improves singing voice separation and vocal F0 estimation by iteratively combining robust PCA and harmonic structure analysis, outperforming existing methods.
Contribution
The paper presents a mutually-dependent approach that enhances both singing voice separation and F0 estimation through iterative refinement using RPCA and harmonic masks.
Findings
Significantly improved separation and F0 estimation performance.
Outperformed all competing methods in MIREX 2014.
Demonstrated the effectiveness of combining RPCA with harmonic masks.
Abstract
This paper presents a new method of singing voice analysis that performs mutually-dependent singing voice separation and vocal fundamental frequency (F0) estimation. Vocal F0 estimation is considered to become easier if singing voices can be separated from a music audio signal, and vocal F0 contours are useful for singing voice separation. This calls for an approach that improves the performance of each of these tasks by using the results of the other. The proposed method first performs robust principal component analysis (RPCA) for roughly extracting singing voices from a target music audio signal. The F0 contour of the main melody is then estimated from the separated singing voices by finding the optimal temporal path over an F0 saliency spectrogram. Finally, the singing voices are separated again more accurately by combining a conventional time-frequency mask given by RPCA with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
