InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation
Yuchi Wang, Junliang Guo, Jianhong Bai, Runyi Yu, Tianyu He, Xu Tan,, Xu Sun, Jiang Bian

TL;DR
InstructAvatar is a novel text-guided framework that enables fine-grained control of emotion and facial motion in avatar videos, improving expressiveness and naturalness over existing methods.
Contribution
It introduces a new automatic annotation pipeline and a two-branch diffusion model for simultaneous audio and text-driven avatar generation.
Findings
Outperforms existing methods in emotion control and lip-sync quality.
Produces more natural and expressive avatar videos.
Demonstrates strong alignment with text and audio instructions.
Abstract
Recent talking avatar generation models have made strides in achieving realistic and accurate lip synchronization with the audio, but often fall short in controlling and conveying detailed expressions and emotions of the avatar, making the generated video less vivid and controllable. In this paper, we propose a novel text-guided approach for generating emotionally expressive 2D avatars, offering fine-grained control, improved interactivity, and generalizability to the resulting video. Our framework, named InstructAvatar, leverages a natural language interface to control the emotion as well as the facial motion of avatars. Technically, we design an automatic annotation pipeline to construct an instruction-video paired training dataset, equipped with a novel two-branch diffusion-based generator to predict avatars with audio and text instructions at the same time. Experimental results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Educational Games and Gamification · Social Robot Interaction and HRI
MethodsALIGN
