Condition Matters in Full-head 3D GANs
Heyuan Li, Huimin Zhang, Yuda Qiu, Zhengwentai Sun, Keru Zheng, Lingteng Qiu, Peihao Li, Qi Zuo, Ce Chen, Yujian Zheng, Yuming Gu, Zilong Dong, Xiaoguang Han

TL;DR
This paper introduces view-invariant semantic conditioning for full-head 3D GANs, improving training stability, diversity, and global coherence by decoupling view bias and leveraging a novel dataset extension.
Contribution
It proposes a semantic conditioning approach using view-invariant features and a new dataset extension method to enhance 3D head generation quality and diversity.
Findings
Higher fidelity and diversity in generated 3D heads
Improved global coherence across different head regions
Faster training convergence and better generalization
Abstract
Conditioning is crucial for stable training of full-head 3D GANs. Without any conditioning signal, the model suffers from severe mode collapse, making it impractical to training. However, a series of previous full-head 3D GANs conventionally choose the view angle as the conditioning input, which leads to a bias in the learned 3D full-head space along the conditional view direction. This is evident in the significant differences in generation quality and diversity between the conditional view and non-conditional views of the generated 3D heads, resulting in global incoherence across different head regions. In this work, we propose to use view-invariant semantic feature as the conditioning input, thereby decoupling the generative capability of 3D heads from the viewing direction. To construct a view-invariant semantic condition for each training image, we create a novel synthesized head…
Peer Reviews
Decision·ICLR 2026 Poster
- Paper story sounds generally reasonable. Very clear pitch right from the abstract. - The paper is interesting and, despite presenting a simple observation and a fix for it, has value for the community and provides an additional important insight into training GANs. For me, reading the paper was interesting mainly because of the observation about the GAN instability and a bit less about the proposed fix for it (see Weaknesses). - Image-flitering head is a smart idea that ensures high quality o
- Not very clear how much the size of the dataset + presence of multi-view images in the dataset is actually helping. - Writing is often unclear. - Teaser is also a bit unclear: the results there look just very similar to other generators, e.g., SphereHead. I think it would be clearer for the reader if some conditioning is also shown. Also, the geometry of the back of the head is not shown there, even though it's likely one of the strongest improvement points. - My (subjective) feeling
[Originality] - How to improve the quality of 3D full head synthesis has been a long problem since the publication of EG3D, and the inherited view conditioning has been a quite annoying part. This paper proposes a new conditioning method under the help of current 2D image generation and editing models. Though the data are synthetic, they maintain a level of realism and help to learn a more geometrically consistent 3D GAN. I personally like this idea. [Quality] - This paper conducts common quali
Overall I do not see significant weaknesses of the current manuscript.
1. The work targets a well-known instability issue in 3D-aware GANs, specifically the severe mode collapse that results from a lack of proper conditioning. 2. The idea of using view-invariant semantic features is a promising direction for decoupling identity/semantics from viewing pose, which should inherently lead to more stable and diverse generation compared to simpler conditioning methods. 3. The BalanceHead is shown to generate high-quality results, including random-view, multi-view rende
1. Experiments do not offer a clear comparison to established State-of-the-Art full-head 3D-aware GANs. Without standard quantitative metrics (e.g., FID, diversity scores, LPIPS), the claimed high-fidelity and diversity are difficult to objectively verify. 2. While the conditioning is claimed to be novel, the paper needs to demonstrate the necessity of using view-invariant semantic features specifically. Without an ablation comparing this approach to simpler, non-semantic conditioning or existin
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · 3D Shape Modeling and Analysis
