Improving Neural Pitch Estimation with SWIPE Kernels
David Marttila, Joshua D. Reiss

TL;DR
This paper demonstrates that using SWIPE kernels as an audio frontend enhances neural pitch estimation by improving accuracy, robustness, and efficiency, outperforming existing methods and reducing model complexity.
Contribution
The study introduces SWIPE kernels as a task-specific audio frontend, significantly improving neural pitch estimators' performance and efficiency over traditional approaches.
Findings
SWIPE kernels improve pitch estimation accuracy.
Neural networks can be reduced in size by an order of magnitude.
SWIPE alone outperforms many state-of-the-art neural estimators.
Abstract
Neural networks have become the dominant technique for accurate pitch and periodicity estimation. Although a lot of research has gone into improving network architectures and training paradigms, most approaches operate directly on the raw audio waveform or on general-purpose time-frequency representations. We investigate the use of Sawtooth-Inspired Pitch Estimation (SWIPE) kernels as an audio frontend and find that these hand-crafted, task-specific features can make neural pitch estimators more accurate, robust to noise, and more parameter-efficient. We evaluate supervised and self-supervised state-of-the-art architectures on common datasets and show that the SWIPE audio frontend allows for reducing the network size by an order of magnitude without performance degradation. Additionally, we show that the SWIPE algorithm on its own is much more accurate than commonly reported,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
