MUA: Mobile Ultra-detailed Animatable Avatars
Heming Zhu, Guoxing Sun, Marc Habermann

TL;DR
This paper introduces a novel avatar representation that combines high fidelity and efficiency, enabling real-time, high-quality animatable avatars on resource-constrained devices by leveraging wavelet spectral decomposition and a distillation pipeline.
Contribution
It proposes Wavelet-guided Multi-level Spatial Factorized Blendshapes and a distillation method to achieve ultra-detailed avatars with significantly reduced computational cost and model size.
Findings
Achieves up to 2000X lower computational cost compared to high-quality models.
Attains 180 FPS on desktop and 24 FPS on Meta Quest 3.
Outperforms existing mobile avatar methods in quality and efficiency.
Abstract
Building photorealistic, animatable full-body digital humans remains a longstanding challenge in computer graphics and vision. Recent advances in animatable avatar modeling have largely progressed along two directions: improving the fidelity of dynamic geometry and appearance, or reducing computational complexity to enable deployment on resource-constrained platforms, e.g., VR headsets. However, existing approaches fail to achieve both goals simultaneously: Ultra-high-fidelity avatars typically require substantial computation on server-class GPUs, whereas lightweight avatars often suffer from limited surface dynamics, reduced appearance details, and noticeable artifacts. To bridge this gap, we propose a novel animatable avatar representation, termed Wavelet-guided Multi-level Spatial Factorized Blendshapes, and a corresponding distillation pipeline that transfers motion-aware clothing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
