Modeling and Estimation of Vocal Tract and Glottal Source Parameters   Using ARMAX-LF Model

Kai Lia; Masato Akagia; Yongwei Lib; Masashi Unokia

arXiv:2410.04704·cs.SD·October 8, 2024

Modeling and Estimation of Vocal Tract and Glottal Source Parameters Using ARMAX-LF Model

Kai Lia, Masato Akagia, Yongwei Lib, Masashi Unokia

PDF

Open Access

TL;DR

This paper introduces the ARMAX-LF model combined with deep neural networks for more accurate and efficient estimation of vocal tract and glottal source parameters across a wider range of speech sounds, including vowels and nasalized consonants.

Contribution

The paper proposes the ARMAX-LF model and a DNN-based estimation method, extending previous models to better handle diverse speech sounds with reduced errors and no iterative procedures.

Findings

01

ARMAX-LF model improves parameter estimation accuracy.

02

DNN-based estimation reduces errors and computation time.

03

Effective for vowels and nasalized sounds in real and synthesized speech.

Abstract

Modeling and estimation of the vocal tract and glottal source parameters of vowels from raw speech can be typically done by using the Auto-Regressive with eXogenous input (ARX) model and Liljencrants-Fant (LF) model with an iteration-based estimation approach. However, the all-pole autoregressive model in the modeling of vocal tract filters cannot provide the locations of anti-formants (zeros), which increases the estimation errors in certain classes of speech sounds, such as nasal, fricative, and stop consonants. In this paper, we propose the Auto-Regressive Moving Average eXogenous with LF (ARMAX-LF) model to extend the ARX-LF model to a wider variety of speech sounds, including vowels and nasalized consonants. The LF model represents the glottal source derivative as a parametrized time-domain model, and the ARMAX model represents the vocal tract as a pole-zero filter with an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis