ViSA: 3D-Aware Video Shading for Real-Time Upper-Body Avatar Creation
Fan Yang, Heyuan Li, Peihao Li, Weihao Yuan, Lingteng Qiu, Chaoyue Song, Cheng Chen, Yisheng He, Shifeng Zhang, Xiaoguang Han, Steven Hoi, Guosheng Lin

TL;DR
This paper introduces ViSA, a real-time system that combines 3D reconstruction and video diffusion models to generate high-fidelity, dynamic upper-body avatars with realistic appearance and motion, overcoming artifacts of previous methods.
Contribution
We propose a novel framework that integrates 3D reconstruction with autoregressive video diffusion for stable, photorealistic avatar creation in real time.
Findings
Significantly reduces texture blur and motion stiffness.
Achieves high visual quality and structural consistency.
Enables real-time avatar synthesis for VR and gaming.
Abstract
Generating high-fidelity upper-body 3D avatars from one-shot input image remains a significant challenge. Current 3D avatar generation methods, which rely on large reconstruction models, are fast and capable of producing stable body structures, but they often suffer from artifacts such as blurry textures and stiff, unnatural motion. In contrast, generative video models show promising performance by synthesizing photorealistic and dynamic results, but they frequently struggle with unstable behavior, including body structural errors and identity drift. To address these limitations, we propose a novel approach that combines the strengths of both paradigms. Our framework employs a 3D reconstruction model to provide robust structural and appearance priors, which in turn guides a real-time autoregressive video diffusion model for rendering. This process enables the model to synthesize…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis · Human Motion and Animation
