KARMA: Kalman-based autoregressive moving average modeling and inference for formant and antiformant tracking
Daryush D. Mehta, Daniel Rudoy, Patrick J. Wolfe

TL;DR
KARMA introduces a Kalman-based state-space approach for formant and antiformant tracking in speech signals, providing both estimates and uncertainties, outperforming traditional methods in accuracy and confidence assessment.
Contribution
This work presents a novel Kalman-based ARMA modeling framework that jointly estimates formant and antiformant parameters with uncertainty quantification, improving over existing point-estimate methods.
Findings
Lower root-mean-square error in formant tracking compared to benchmarks
Effective antiformant tracking demonstrated on synthesized and spoken nasal sounds
Simultaneous uncertainty estimation enables confidence-aware analysis
Abstract
Vocal tract resonance characteristics in acoustic speech signals are classically tracked using frame-by-frame point estimates of formant frequencies followed by candidate selection and smoothing using dynamic programming methods that minimize ad hoc cost functions. The goal of the current work is to provide both point estimates and associated uncertainties of center frequencies and bandwidths in a statistically principled state-space framework. Extended Kalman (K) algorithms take advantage of a linearized mapping to infer formant and antiformant parameters from frame-based estimates of autoregressive moving average (ARMA) cepstral coefficients. Error analysis of KARMA, WaveSurfer, and Praat is accomplished in the all-pole case using a manually marked formant database and synthesized speech waveforms. KARMA formant tracks exhibit lower overall root-mean-square error relative to the two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
