FFAvatar: Few-Shot, Feed-Forward, and Generalizable Avatar Reconstruction
Thuan Hoang Nguyen, Jiahao Luo, Yinyu Nie, Hao Li, Gordon Guocheng Qian, Jian Wang

TL;DR
FFAvatar is a fast, generalizable framework for high-quality 3D avatar reconstruction from few images, eliminating the need for hours of optimization or extensive preprocessing.
Contribution
It introduces a feed-forward, multi-stage training approach that achieves broad generalization and high-fidelity avatar reconstruction from minimal input images.
Findings
Outperforms state-of-the-art LAM with 5.5 PSNR gain on NeRSemble benchmark.
Reconstructs avatars in 2 seconds without personalization and 10 seconds with personalization.
Supports 49 FPS animation on a single GPU.
Abstract
Avatar reconstruction has traditionally relied on per-subject optimization that requires hours of computation or on expensive preprocessing that limits scalability. We introduce FFAvatar, a generalizable feed-forward framework that reconstructs high-quality, animatable 3D Gaussian head avatars from few-shot unposed portrait images in seconds. FFAvatar fuses information from multiple source images into a unified canonical Gaussian representation through Multi-View Query-Former, which is animated via FLAME parameters predicted end-to-end directly from pixels, eliminating the overhead of offline FLAME extraction. We further propose a three-stage training curriculum that achieves both broad generalization and high-fidelity reconstruction: (i) scalable pretraining on extensive monocular video data with over 1M identities to learn strong generalizable priors; (ii) multi-view fine-tuning on a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
