Human Voice Pitch Estimation: A Convolutional Network with Auto-Labeled   and Synthetic Data

Jeremy Cochoy

arXiv:2308.07170·cs.SD·December 19, 2023

Human Voice Pitch Estimation: A Convolutional Network with Auto-Labeled and Synthetic Data

Jeremy Cochoy

PDF

Open Access 1 Repo

TL;DR

This paper introduces a convolutional neural network for human voice pitch estimation that leverages auto-labeled and synthetic data, achieving robust performance across diverse audio datasets in music and voice applications.

Contribution

It presents a novel CNN architecture trained on combined synthetic and auto-labeled data for improved pitch extraction from human singing voices.

Findings

01

Effective across synthetic and real-world datasets

02

Outperforms traditional pitch estimation methods

03

Robust in diverse singing and speech scenarios

Abstract

In the domain of music and sound processing, pitch extraction plays a pivotal role. Our research presents a specialized convolutional neural network designed for pitch extraction, particularly from the human singing voice in acapella performances. Notably, our approach combines synthetic data with auto-labeled acapella sung audio, creating a robust training environment. Evaluation across datasets comprising synthetic sounds, opera recordings, and time-stretched vowels demonstrates its efficacy. This work paves the way for enhanced pitch extraction in both music and voice settings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jeremycochoy/pitchnet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing