A Streamlined Encoder/Decoder Architecture for Melody Extraction
Tsung-Han Hsieh, Li Su, Yi-Hsuan Yang

TL;DR
This paper introduces a simplified encoder/decoder neural network for melody extraction that achieves near state-of-the-art results with fewer layers by utilizing pooling indices and a novel approach for melody existence estimation.
Contribution
The paper presents a streamlined architecture for melody extraction that reduces complexity while maintaining high accuracy, including a new method for melody existence detection.
Findings
Achieves near state-of-the-art performance with fewer convolutional layers
Uses pooling indices for better localization of melody in frequency
Employs a simple argmax for melody existence estimation
Abstract
Melody extraction in polyphonic musical audio is important for music signal processing. In this paper, we propose a novel streamlined encoder/decoder network that is designed for the task. We make two technical contributions. First, drawing inspiration from a state-of-the-art model for semantic pixel-wise segmentation, we pass through the pooling indices between pooling and un-pooling layers to localize the melody in frequency. We can achieve result close to the state-of-the-art with much fewer convolutional layers and simpler convolution modules. Second, we propose a way to use the bottleneck layer of the network to estimate the existence of a melody line for each time frame, and make it possible to use a simple argmax function instead of ad-hoc thresholding to get the final estimation of the melody line. Our experiments on both vocal melody extraction and general melody extraction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
