HarmoF0: Logarithmic Scale Dilated Convolution For Pitch Estimation
Weixing Wei, Peilin Li, Yi Yu, Wei Li

TL;DR
HarmoF0 introduces a novel logarithmic scale dilated convolution approach for improved pitch estimation, outperforming existing models with fewer parameters and enhanced noise resistance in multiple datasets.
Contribution
The paper presents HarmoF0, a fully convolutional network utilizing MRDC-Conv to better capture harmonic structures in logarithmic spectrograms for pitch estimation.
Findings
Outperforms DeepF0 in three datasets
Reduces over 90% of model parameters
Shows stronger noise resistance and fewer octave errors
Abstract
Sounds, especially music, contain various harmonic components scattered in the frequency dimension. It is difficult for normal convolutional neural networks to observe these overtones. This paper introduces a multiple rates dilated causal convolution (MRDC-Conv) method to capture the harmonic structure in logarithmic scale spectrograms efficiently. The harmonic is helpful for pitch estimation, which is important for many sound processing applications. We propose HarmoF0, a fully convolutional network, to evaluate the MRDC-Conv and other dilated convolutions in pitch estimation. The results show that this model outperforms the DeepF0, yields state-of-the-art performance in three datasets, and simultaneously reduces more than 90% parameters. We also find that it has stronger noise resistance and fewer octave errors. The code and pre-trained model are available at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Advanced Adaptive Filtering Techniques
MethodsCausal Convolution · Convolution · Dilated Causal Convolution
