Bringing NeRFs to the Latent Space: Inverse Graphics Autoencoder
Antoine Schnepf, Karim Kassab, Jean-Yves Franceschi, Laurent Caraffa,, Flavian Vasile, Jeremie Mary, Andrew Comport, Valerie Gouet-Brunet

TL;DR
This paper introduces an Inverse Graphics Autoencoder (IG-AE) that aligns 2D image autoencoders with 3D geometry, enabling efficient latent space manipulation of NeRFs with improved quality and faster training.
Contribution
The paper proposes a novel IG-AE that regularizes image autoencoders with 3D geometry, facilitating latent space NeRF training and enhancing performance.
Findings
Latent NeRFs with IG-AE outperform standard autoencoders in quality.
Training and rendering are faster with IG-AE-based NeRFs.
Open-source implementation extends Nerfstudio for latent scene learning.
Abstract
While pre-trained image autoencoders are increasingly utilized in computer vision, the application of inverse graphics in 2D latent spaces has been under-explored. Yet, besides reducing the training and rendering complexity, applying inverse graphics in the latent space enables a valuable interoperability with other latent-based 2D methods. The major challenge is that inverse graphics cannot be directly applied to such image latent spaces because they lack an underlying 3D geometry. In this paper, we propose an Inverse Graphics Autoencoder (IG-AE) that specifically addresses this issue. To this end, we regularize an image autoencoder with 3D-geometry by aligning its latent space with jointly trained latent 3D scenes. We utilize the trained IG-AE to bring NeRFs to the latent space with a latent NeRF training pipeline, which we implement in an open-source extension of the Nerfstudio…
Peer Reviews
Decision·ICLR 2025 Poster
1. 3D-awareness of 2D image generation from autoencoders is an important issue, and this paper seems to be the first work to address it. 2. The proposed latent NeRF operates fully within the latent space, which is a standardized solution and can work as an open-source extension to established NeRF architectures. The authors submitted the code in the supplementary. 3. The training framework of latent NeRF and IG-AE is sensible, with detailed ablation study to justify the loss design. 4. The paper
Although I agree with the importance of 3D-awareness of 2D autoencoders and appreciate the authors' efforts, I still have some concerns/questions for the proposed method to address 3D-awareness with latent NeRF: 1. Is a latent NeRF really necessary? Does it bring more advantage or more damage to the standard 2D AEs, especially when the scenes are quite complicated? Learning a scene with a NeRF model tends to smooth out high-frequency details (easier to be 3D-inconsistent), which is also true for
- The paper attempts to research on an important problem of 3D-aware latent spaces. - The proposed method overall makes sense: the introduce of individually trained latent tri-planes provides a auxiliary variable which serves as a "3D-aware" guidance for auto-encoded latents. - Experiments show that the proposed IG-AE is better for training NeRFs than vanilla AE. - The method can easily integrated into NeRFStudio with an open-source extension.
My majority concerns are as follows: (1) While the proposed method is interesting and makes sense, the claimed property of "3D-aware" latent space cannot be fully justified from the given experiment results: - (a) The proposed method is only tested on dataset with limited variations. For evaluation NeRFs, ShapeNet dataset is not a best choice as it contains simple shapes and textures without any non-Lambertian effects. Additionally, the test dataset contains only three categories for Shapenet a
1. **Novel Use of Inverse Graphics in Latent Space**: The authors claim that they are the first to explore this direction, which explores the relatively untapped area of applying inverse graphics in 2D latent spaces, which reduces training and rendering complexity and offers compatibility with other latent-based 2D methods. 2. **Integration of 3D Geometry into Latent Spaces**: The authors address the issue of the lack of 3D geometry in standard image latent spaces by regularizing the autoencode
1. **The writing is pretty hard to follow.** I tried to understand this paper by reading over and over again, but still find it very hard to follow. Since this direction is relatively new, I strongly suggest the author to revise the writing in the later version for clearer elaboration. 2. **The motivation of the proposed method is unclear.** From my understanding, the proposed method try to align the NeRF rendering to a pretrained autoencoder (Ostris KL-f8-d16 VAE here), but what are the benefi
Videos
Taxonomy
TopicsComputer Graphics and Visualization Techniques · 3D Shape Modeling and Analysis · Human Pose and Action Recognition
