Visual-speech Synthesis of Exaggerated Corrective Feedback
Yaohua Bu, Weijun Li, Tianyi Ma, Shengqi Chen, Jia Jia, Kun Li, Xiaobo, Lu

TL;DR
This paper introduces a novel method for exaggerated visual-speech feedback in language learning, combining neural speech synthesis with visual exaggeration techniques to improve learners' pronunciation skills.
Contribution
It presents a new approach integrating neural speech synthesis and visual exaggeration for enhanced pronunciation feedback in language learning.
Findings
Exaggerated feedback improves pronunciation identification.
Exaggerated feedback enhances pronunciation improvement.
User studies confirm effectiveness of the approach.
Abstract
To provide more discriminative feedback for the second language (L2) learners to better identify their mispronunciation, we propose a method for exaggerated visual-speech feedback in computer-assisted pronunciation training (CAPT). The speech exaggeration is realized by an emphatic speech generation neural network based on Tacotron, while the visual exaggeration is accomplished by ADC Viseme Blending, namely increasing Amplitude of movement, extending the phone's Duration and enhancing the color Contrast. User studies show that exaggerated feedback outperforms non-exaggerated version on helping learners with pronunciation identification and pronunciation improvement.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Phonetics and Phonology Research · Speech Recognition and Synthesis
MethodsSigmoid Activation · Highway Layer · Highway Network · Batch Normalization · Residual Connection · Max Pooling · Bidirectional GRU · Convolution · CBHG · Tanh Activation
