Visual-speech Synthesis of Exaggerated Corrective Feedback

Yaohua Bu; Weijun Li; Tianyi Ma; Shengqi Chen; Jia Jia; Kun Li; Xiaobo; Lu

arXiv:2009.05748·eess.AS·December 16, 2020

Visual-speech Synthesis of Exaggerated Corrective Feedback

Yaohua Bu, Weijun Li, Tianyi Ma, Shengqi Chen, Jia Jia, Kun Li, Xiaobo, Lu

PDF

Open Access

TL;DR

This paper introduces a novel method for exaggerated visual-speech feedback in language learning, combining neural speech synthesis with visual exaggeration techniques to improve learners' pronunciation skills.

Contribution

It presents a new approach integrating neural speech synthesis and visual exaggeration for enhanced pronunciation feedback in language learning.

Findings

01

Exaggerated feedback improves pronunciation identification.

02

Exaggerated feedback enhances pronunciation improvement.

03

User studies confirm effectiveness of the approach.

Abstract

To provide more discriminative feedback for the second language (L2) learners to better identify their mispronunciation, we propose a method for exaggerated visual-speech feedback in computer-assisted pronunciation training (CAPT). The speech exaggeration is realized by an emphatic speech generation neural network based on Tacotron, while the visual exaggeration is accomplished by ADC Viseme Blending, namely increasing Amplitude of movement, extending the phone's Duration and enhancing the color Contrast. User studies show that exaggerated feedback outperforms non-exaggerated version on helping learners with pronunciation identification and pronunciation improvement.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Phonetics and Phonology Research · Speech Recognition and Synthesis

MethodsSigmoid Activation · Highway Layer · Highway Network · Batch Normalization · Residual Connection · Max Pooling · Bidirectional GRU · Convolution · CBHG · Tanh Activation