Physics-Informed Neural Networks for Speech Production
Kazuya Yokota, Ryosuke Harakawa, Masaaki Baba, Masahiro Iwahashi

TL;DR
This paper introduces a physics-informed neural network approach for analyzing speech production, modeling vocal fold vibrations and vocal tract acoustics directly from physical equations, enabling both forward and inverse analysis of speech signals.
Contribution
It presents a novel PINN-based method that incorporates physical models of speech production, including handling nondifferentiability and unknown vibration periods, for comprehensive speech analysis.
Findings
Successfully models vocal-fold vibrations with differentiable approximation.
Enables simultaneous estimation of glottal flow, vocal-fold state, and pressure.
Demonstrates versatility in both forward and inverse speech analysis.
Abstract
The analysis of speech production based on physical models of the vocal folds and vocal tract is essential for studies on vocal-fold behavior and linguistic research. This paper proposes a speech production analysis method using physics-informed neural networks (PINNs). The networks are trained directly on the governing equations of vocal-fold vibration and vocal-tract acoustics. Vocal-fold collisions introduce nondifferentiability and vanishing gradients, challenging phenomena for PINNs. We demonstrate, however, that introducing a differentiable approximation function enables the analysis of vocal-fold vibrations within the PINN framework. The period of self-excited vocal-fold vibration is generally unknown. We show that by treating the period as a learnable network parameter, a periodic solution can be obtained. Furthermore, by implementing the coupling between glottal flow and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Voice and Speech Disorders
