Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation
Taekyung Ki, Sangwon Jang, Jaehyeong Jo, Jaehong Yoon, Sung Ju Hwang

TL;DR
This paper introduces Avatar Forcing, a real-time interactive head avatar generation framework that enables lifelike, expressive reactions in virtual communication by processing multimodal inputs with low latency and learning without labeled data.
Contribution
It presents a novel diffusion forcing approach for real-time avatar interaction and a label-free learning method for expressive reactions, advancing interactive virtual avatar technology.
Findings
Achieves approximately 500ms latency in avatar reactions.
Provides 6.8X speedup over baseline methods.
Over 80% user preference for reactive and expressive avatars.
Abstract
Talking head generation creates lifelike avatars from static portraits for virtual communication and content creation. However, current models do not yet convey the feeling of truly interactive communication, often generating one-way responses that lack emotional engagement. We identify two key challenges toward truly interactive avatars: generating motion in real-time under causal constraints and learning expressive, vibrant reactions without additional labeled data. To address these challenges, we propose Avatar Forcing, a new framework for interactive head avatar generation that models real-time user-avatar interactions through diffusion forcing. This design allows the avatar to process real-time multimodal inputs, including the user's audio and motion, with low latency for instant reactions to both verbal and non-verbal cues such as speech, nods, and laughter. Furthermore, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Social Robot Interaction and HRI
