TALK-Act: Enhance Textural-Awareness for 2D Speaking Avatar Reenactment   with Diffusion Model

Jiazhi Guan; Quanwei Yang; Kaisiyuan Wang; Hang Zhou; Shengyi He,; Zhiliang Xu; Haocheng Feng; Errui Ding; Jingdong Wang; Hongtao Xie; Youjian; Zhao; Ziwei Liu

arXiv:2410.10696·cs.CV·October 15, 2024

TALK-Act: Enhance Textural-Awareness for 2D Speaking Avatar Reenactment with Diffusion Model

Jiazhi Guan, Quanwei Yang, Kaisiyuan Wang, Hang Zhou, Shengyi He,, Zhiliang Xu, Haocheng Feng, Errui Ding, Jingdong Wang, Hongtao Xie, Youjian, Zhao, Ziwei Liu

PDF

Open Access

TL;DR

TALK-Act introduces a diffusion model-based framework that enhances 2D speaking avatar reenactment by explicitly controlling facial, torso, and gesture movements, achieving high fidelity with minimal data.

Contribution

The paper proposes a novel diffusion model framework with motion guidance and structural information for comprehensive avatar reenactment from limited footage.

Findings

01

High-fidelity avatar reenactment from 30 seconds of data

02

Effective control of face, torso, and gestures

03

Superior stability and realism in results

Abstract

Recently, 2D speaking avatars have increasingly participated in everyday scenarios due to the fast development of facial animation techniques. However, most existing works neglect the explicit control of human bodies. In this paper, we propose to drive not only the faces but also the torso and gesture movements of a speaking figure. Inspired by recent advances in diffusion models, we propose the Motion-Enhanced Textural-Aware ModeLing for SpeaKing Avatar Reenactment (TALK-Act) framework, which enables high-fidelity avatar reenactment from only short footage of monocular video. Our key idea is to enhance the textural awareness with explicit motion guidance in diffusion modeling. Specifically, we carefully construct 2D and 3D structural information as intermediate guidance. While recent diffusion models adopt a side network for control information injection, they fail to synthesize…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation

MethodsDiffusion