GPAvatar: Generalizable and Precise Head Avatar from Image(s)
Xuangeng Chu, Yu Li, Ailing Zeng, Tianyu Yang, Lijian Lin, Yunfei Liu,, Tatsuya Harada

TL;DR
GPAvatar is a novel framework that reconstructs 3D head avatars from minimal images, offering precise expression control, multi-view consistency, and effective identity preservation for applications like VR and gaming.
Contribution
This work introduces a dynamic point-based expression field and a Multi Tri-planes Attention module for improved 3D head avatar reconstruction from limited images.
Findings
Achieves faithful identity reconstruction
Provides precise expression control
Ensures multi-view consistency
Abstract
Head avatar reconstruction, crucial for applications in virtual reality, online meetings, gaming, and film industries, has garnered substantial attention within the computer vision community. The fundamental objective of this field is to faithfully recreate the head avatar and precisely control expressions and postures. Existing methods, categorized into 2D-based warping, mesh-based, and neural rendering approaches, present challenges in maintaining multi-view consistency, incorporating non-facial information, and generalizing to new identities. In this paper, we propose a framework named GPAvatar that reconstructs 3D head avatars from one or several images in a single forward pass. The key idea of this work is to introduce a dynamic point-based expression field driven by a point cloud to precisely and effectively capture expressions. Furthermore, we use a Multi Tri-planes Attention…
Peer Reviews
Decision·ICLR 2024 poster
- Overall, the paper is well-organized and easy to follow. The motivation is clear. The figures and tables are informative. - Experimental results demonstrate that the proposed method achieves the most precise expression control and state-of-the-art synthesis quality (StyleHeat, ROME, OTAvatar, and Next3D) (based on NeRF and 3D generative models)n on multiple on VFHQ and HDTF benchmark datasets.
- The model proposed has overall more trainable parameters compared to baseline models, which could potentially bring in some unfairness during comparison with other works. - No discussion about the limitations of the approach?
+ The proposed method achieves promising results and the supplementary videos show that the videos synthesized by the proposed method are in general realistic and visually pleasing. + The experimental evaluation is detailed and systematic. The proposed method is compared with several recent SOTA methods that solve the same problem.
- The presentation in several parts of the paper, especially in the methodology, is unclear and needs several clarifications. See detailed comments in Questions below. - The following paper is not cited, despite the fact that it is very closely related in terms of methodology: Athar, S., Shu, Z. and Samaras, D., 2023, January. Flame-in-nerf: Neural control of radiance fields for free view face animation. In 2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)
- I like the idea of PEF since it well utilizes the geometry prior of FLAME to help learn the avatar animation in the 3D space. - Also, it is the first one-shot 3D talking face paper that focuses on the few-shot setting. - The paper is well-written and is easy to follow.
- the PEF could well handle the segment modeled by FLAME (such as head and torso), but it cannot handles other parts, such as hair and clothes. See question 1. - The identity similarity in the demo video is worse than some baseline (HideNeRF). - The image quality can be improved. For instance, in Figure 1, the predicted images in the second column seems blurry.
Code & Models
Videos
Taxonomy
TopicsAdvanced Vision and Imaging · Face recognition and analysis · Generative Adversarial Networks and Image Synthesis
