TL;DR
SOAP is a novel framework that generates rigged, style-agnostic 3D avatars from a single portrait, effectively handling diverse styles and accessories while supporting detailed animation and maintaining topology.
Contribution
The paper introduces a multiview diffusion model trained on diverse 3D heads and an adaptive optimization pipeline for topology-preserving avatar reconstruction from single images.
Findings
Outperforms state-of-the-art methods in single-view head modeling
Supports detailed FACS-based animation and accessory preservation
Demonstrates high-quality, style-agnostic 3D avatar generation
Abstract
Creating animatable 3D avatars from a single image remains challenging due to style limitations (realistic, cartoon, anime) and difficulties in handling accessories or hairstyles. While 3D diffusion models advance single-view reconstruction for general objects, outputs often lack animation controls or suffer from artifacts because of the domain gap. We propose SOAP, a style-omniscient framework to generate rigged, topology-consistent avatars from any portrait. Our method leverages a multiview diffusion model trained on 24K 3D heads with multiple styles and an adaptive optimization pipeline to deform the FLAME mesh while maintaining topology and rigging via differentiable rendering. The resulting textured avatars support FACS-based animation, integrate with eyeballs and teeth, and preserve details like braided hair or accessories. Extensive experiments demonstrate the superiority of our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDiffusion
