TL;DR
This paper introduces SpectMamba, a semi-supervised singing melody extraction network that uses vision mamba for efficiency, a note-f0 decoder for musical accuracy, and confidence binary regularization to leverage unlabeled data, outperforming prior models.
Contribution
The paper presents a novel mamba-based network with a linear-complexity architecture, a note-f0 decoder, and a confidence regularization module for semi-supervised singing melody extraction.
Findings
Effective on multiple public datasets
Outperforms existing methods in accuracy and efficiency
Leverages unlabeled data successfully
Abstract
Singing melody extraction (SME) is a key task in the field of music information retrieval. However, existing methods are facing several limitations: firstly, prior models use transformers to capture the contextual dependencies, which requires quadratic computation resulting in low efficiency in the inference stage. Secondly, prior works typically rely on frequencysupervised methods to estimate the fundamental frequency (f0), which ignores that the musical performance is actually based on notes. Thirdly, transformers typically require large amounts of labeled data to achieve optimal performances, but the SME task lacks of sufficient annotated data. To address these issues, in this paper, we propose a mamba-based network, called SpectMamba, for semi-supervised singing melody extraction using confidence binary regularization. In particular, we begin by introducing vision mamba to achieve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMamba: Linear-Time Sequence Modeling with Selective State Spaces
