MemoryTalker: Personalized Speech-Driven 3D Facial Animation via Audio-Guided Stylization
Hyung Kyu Kim, Sangmin Lee, Hak Gu Kim

TL;DR
MemoryTalker is a novel framework that synthesizes personalized 3D facial animations driven solely by audio, effectively capturing speaking styles without requiring prior information or additional meshes.
Contribution
It introduces a two-stage training process that memorizes general motions and then personalizes facial animations based on audio-driven style features, enhancing usability and realism.
Findings
Outperforms state-of-the-art methods in personalized facial animation.
Generates realistic animations without prior speaker information.
Validated through quantitative, qualitative, and user studies.
Abstract
Speech-driven 3D facial animation aims to synthesize realistic facial motion sequences from given audio, matching the speaker's speaking style. However, previous works often require priors such as class labels of a speaker or additional 3D facial meshes at inference, which makes them fail to reflect the speaking style and limits their practical use. To address these issues, we propose MemoryTalker which enables realistic and accurate 3D facial motion synthesis by reflecting speaking style only with audio input to maximize usability in applications. Our framework consists of two training stages: 1-stage is storing and retrieving general motion (i.e., Memorizing), and 2-stage is to perform the personalized facial motion synthesis (i.e., Animating) with the motion memory stylized by the audio-driven speaking style feature. In this second stage, our model learns about which facial motion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
