AgentAvatar: Disentangling Planning, Driving and Rendering for Photorealistic Avatar Agents
Duomin Wang, Bin Dai, Yu Deng, Baoyuan Wang

TL;DR
This paper introduces AgentAvatar, a framework that combines large language models and neural rendering to generate realistic, interactive avatar agents capable of nuanced facial animations from high-level inputs.
Contribution
It presents a novel disentangled pipeline that separates planning, driving, and rendering, enabling flexible and realistic avatar animation from high-level descriptions.
Findings
Effective in generating photorealistic avatar animations
Versatile across monadic and dyadic interactions
Validated on multiple datasets
Abstract
In this study, our goal is to create interactive avatar agents that can autonomously plan and animate nuanced facial movements realistically, from both visual and behavioral perspectives. Given high-level inputs about the environment and agent profile, our framework harnesses LLMs to produce a series of detailed text descriptions of the avatar agents' facial motions. These descriptions are then processed by our task-agnostic driving engine into motion token sequences, which are subsequently converted into continuous motion embeddings that are further consumed by our standalone neural-based renderer to generate the final photorealistic avatar animations. These streamlined processes allow our framework to adapt to a variety of non-verbal avatar interactions, both monadic and dyadic. Our extensive study, which includes experiments on both newly compiled and existing datasets featuring two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Human Motion and Animation · 3D Shape Modeling and Analysis
