CSTalk: Correlation Supervised Speech-driven 3D Emotional Facial Animation Generation
Xiangyu Liang, Wenlin Zhuang, Tianyong Wang, Guangxing Geng, Guangyue, Geng, Haifeng Xia, Siyu Xia

TL;DR
CSTalk is a novel method that models correlations among facial regions and supervises training to generate realistic, emotion-conforming 3D facial animations driven by speech, addressing naturalness and expressiveness issues.
Contribution
The paper introduces CSTalk, a correlation-supervised generative approach that improves naturalness and emotional expressiveness in speech-driven 3D facial animation.
Findings
Outperforms existing state-of-the-art methods
Generates more natural and expressive facial animations
Effectively models correlations among facial regions
Abstract
Speech-driven 3D facial animation technology has been developed for years, but its practical application still lacks expectations. The main challenges lie in data limitations, lip alignment, and the naturalness of facial expressions. Although lip alignment has seen many related studies, existing methods struggle to synthesize natural and realistic expressions, resulting in a mechanical and stiff appearance of facial animations. Even with some research extracting emotional features from speech, the randomness of facial movements limits the effective expression of emotions. To address this issue, this paper proposes a method called CSTalk (Correlation Supervised) that models the correlations among different regions of facial movements and supervises the training of the generative model to generate realistic expressions that conform to human facial motion patterns. To generate more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis
MethodsSparse Evolutionary Training
