Formant Estimation and Tracking using Probabilistic Heat-Maps

Yosi Shrem; Felix Kreuk; Joseph Keshet

arXiv:2206.11632·cs.SD·June 24, 2022

Formant Estimation and Tracking using Probabilistic Heat-Maps

Yosi Shrem, Felix Kreuk, Joseph Keshet

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel neural network architecture that uses probabilistic heatmaps for more accurate and domain-invariant formant estimation and tracking across diverse speech datasets.

Contribution

A new multi-decoder neural network with shared encoder and heatmap outputs that improves formant estimation across different speaker and speech domains.

Findings

01

Enhanced formant tracking accuracy across multiple domains

02

Better domain generalization compared to existing methods

03

Heatmap-based probability distributions improve robustness

Abstract

Formants are the spectral maxima that result from acoustic resonances of the human vocal tract, and their accurate estimation is among the most fundamental speech processing problems. Recent work has been shown that those frequencies can accurately be estimated using deep learning techniques. However, when presented with a speech from a different domain than that in which they have been trained on, these methods exhibit a decline in performance, limiting their usage as generic tools. The contribution of this paper is to propose a new network architecture that performs well on a variety of different speaker and speech domains. Our proposed model is composed of a shared encoder that gets as input a spectrogram and outputs a domain-invariant representation. Then, multiple decoders further process this representation, each responsible for predicting a different formant while considering…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mlspeech/formantstracker
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing