Inferring Pitch from Coarse Spectral Features
Danni Ma, Neville Ryant, Mark Liberman

TL;DR
This paper demonstrates that coarse spectral features can predict pitch better than traditional F0 analysis, highlighting the complexity of pitch perception and suggesting new modeling approaches.
Contribution
It introduces a novel approach using coarse spectral features and linear regression to predict pitch, challenging the reliance on F0 as the sole measure.
Findings
Coarse spectral features can predict pitch in simple vocalizations.
Prediction accuracy decreases with more complex vocalizations.
Covariates for pitch are more complex but still accessible for advanced models.
Abstract
Fundamental frequency (F0) has long been treated as the physical definition of "pitch" in phonetic analysis. But there have been many demonstrations that F0 is at best an approximation to pitch, both in production and in perception: pitch is not F0, and F0 is not pitch. Changes in the pitch involve many articulatory and acoustic covariates; pitch perception often deviates from what F0 analysis predicts; and in fact, quasi-periodic signals from a single voice source are often incompletely characterized by an attempt to define a single time-varying F0. In this paper, we find strong support for the existence of covariates for pitch in aspects of relatively coarse spectra, in which an overtone series is not available. Thus linear regression can predict the pitch of simple vocalizations, produced by an articulatory synthesizer or by human, from single frames of such coarse spectra. Across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing
