Using multimodal speech production data to evaluate articulatory   animation for audiovisual speech synthesis

Ingmar Steiner (INRIA Lorraine - LORIA; Trinity College Dublin); Korin; Richmond (CSTR); Slim Ouni (INRIA Lorraine - LORIA)

arXiv:1209.4982·cs.HC·September 25, 2012

Using multimodal speech production data to evaluate articulatory animation for audiovisual speech synthesis

Ingmar Steiner (INRIA Lorraine - LORIA, Trinity College Dublin), Korin, Richmond (CSTR), Slim Ouni (INRIA Lorraine - LORIA)

PDF

Open Access

TL;DR

This paper explores the use of multimodal speech production data to enhance the animation of intraoral articulators, aiming to improve audiovisual speech synthesis quality by integrating detailed articulatory modeling.

Contribution

It introduces a data-driven approach to animate intraoral articulators using multimodal speech production data, advancing beyond simple rule-based methods.

Findings

01

Multimodal data improves articulatory animation quality.

02

Enhanced intraoral articulator modeling leads to more realistic AV speech.

03

Data-driven methods outperform traditional viseme morphing techniques.

Abstract

The importance of modeling speech articulation for high-quality audiovisual (AV) speech synthesis is widely acknowledged. Nevertheless, while state-of-the-art, data-driven approaches to facial animation can make use of sophisticated motion capture techniques, the animation of the intraoral articulators (viz. the tongue, jaw, and velum) typically makes use of simple rules or viseme morphing, in stark contrast to the otherwise high quality of facial modeling. Using appropriate speech production data could significantly improve the quality of articulatory animation for AV synthesis.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Speech and Audio Processing