Latent Radiance Fields with 3D-aware 2D Representations
Chaoyi Zhou, Xi Liu, Feng Luo, Siyu Huang

TL;DR
This paper introduces a novel framework that integrates 3D awareness into 2D latent representations, enabling photorealistic 3D reconstruction from 2D features with improved consistency and generalization.
Contribution
The work presents a three-stage approach combining correspondence-aware autoencoding, latent radiance fields, and VAE-RF alignment to enhance 3D reconstruction from 2D latent spaces.
Findings
Outperforms state-of-the-art in synthesis quality
Demonstrates strong cross-dataset generalization
Achieves photorealistic 3D reconstructions from 2D latent representations
Abstract
Latent 3D reconstruction has shown great promise in empowering 3D semantic understanding and 3D generation by distilling 2D features into the 3D space. However, existing approaches struggle with the domain gap between 2D feature space and 3D representations, resulting in degraded rendering performance. To address this challenge, we propose a novel framework that integrates 3D awareness into the 2D latent space. The framework consists of three stages: (1) a correspondence-aware autoencoding method that enhances the 3D consistency of 2D latent representations, (2) a latent radiance field (LRF) that lifts these 3D-aware 2D representations into 3D space, and (3) a VAE-Radiance Field (VAE-RF) alignment strategy that improves image decoding from the rendered 2D representations. Extensive experiments demonstrate that our method outperforms the state-of-the-art latent 3D reconstruction…
Peer Reviews
Decision·ICLR 2025 Poster
There are many innovations in this work, but I think the best part is the introduction of the 3d awareness into the 2D representation training. In this part, especially the correspondence aware autoencoding is the key to the success of this overall idea.
There are still some weaknesses prevented me from giving a higher score, especially, the details of how to compute each component of the pipeline. Please see my questions below. In addition, some related references are missing.
The author is committed to integrating 3D awareness into the 2D latent space, and the results show a significant degree of success in this endeavor. Additionally, using 3D Gaussian Splatting (3DGS) in modeling the latent space is an intriguing idea.
The motivation of this paper is somewhat unclear. Is the author aiming to improve reconstruction accuracy, enhance rendering speed, reduce storage space, or achieve some other application? It appears that none of these goals have been fully addressed. **Reconstruction Accuracy**: When training the comparison methods, the author down-scaled the RGB images to the same resolution as the latent representation before training, which may be considered unfair. The VAE used by the author has been expos
- The paper follows a standard pipeline structure addressing latent space 3D reconstruction. The method section breaks down into three components: correspondence-aware encoding, latent radiance field construction, and VAE alignment. The ablation study provides basic validation of these components, though more comprehensive analysis would be beneficial. - While building heavily on existing techniques, the paper demonstrates competent engineering in combining different elements into a working syst
- The paper fails to provide compelling justification for operating in latent space. While previous works like Latent-NeRF (for text-to-3D generation) established initial groundwork, this paper does not clearly demonstrate additional benefits of its approach. The motivation for operating in latent space remains questionable. The paper shows modest improvements in PSNR/SSIM metrics but doesn't address fundamental questions: What are the computational advantages over image-space methods? How does
Code & Models
Videos
Taxonomy
TopicsComputer Graphics and Visualization Techniques · 3D Shape Modeling and Analysis
