SwiftF0: Fast and Accurate Monophonic Pitch Detection

Lars Nieradzik

arXiv:2508.18440·cs.SD·August 27, 2025

SwiftF0: Fast and Accurate Monophonic Pitch Detection

Lars Nieradzik

PDF

TL;DR

SwiftF0 is a lightweight neural network that achieves state-of-the-art monophonic pitch estimation in noisy conditions with high speed and accuracy, suitable for real-time applications on resource-limited devices.

Contribution

The paper introduces SwiftF0, a novel efficient neural model for pitch detection, and a synthetic speech dataset for better training and evaluation.

Findings

01

Achieves 91.80% harmonic mean at 10 dB SNR, outperforming baselines.

02

Runs 42x faster than CREPE on CPU with only 95,842 parameters.

03

Provides a new synthetic dataset and comprehensive evaluation metrics.

Abstract

Accurate and real-time monophonic pitch estimation in noisy conditions, particularly on resource-constrained devices, remains an open challenge in audio processing. We present \emph{SwiftF0}, a novel, lightweight neural model that sets a new state-of-the-art for monophonic pitch estimation. Through training on diverse speech, music, and synthetic datasets with extensive data augmentation, SwiftF0 achieves robust generalization across acoustic domains while maintaining computational efficiency. SwiftF0 achieves a 91.80\% harmonic mean (HM) at 10 dB SNR, outperforming baselines like CREPE by over 12 percentage points and degrading by only 2.3 points from clean audio. SwiftF0 requires only 95,842 parameters and runs approximately 42x faster than CREPE on CPU, making it ideal for efficient, real-time deployment. To address the critical lack of perfectly accurate ground truth pitch in speech…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.