Large Point-to-Gaussian Model for Image-to-3D Generation
Longfei Lu, Huachen Gao, Tao Dai, Yaohua Zha, Zhi Hou, Junta Wu,, Shu-Tao Xia

TL;DR
This paper introduces a novel Point-to-Gaussian model that leverages initial point clouds from 3D diffusion models to improve image-to-3D asset generation, achieving state-of-the-art results.
Contribution
The paper proposes a new Point-to-Gaussian approach with an APP block for better image-to-3D generation, integrating point cloud priors and advanced feature fusion techniques.
Findings
Achieves state-of-the-art performance on GSO and Objaverse datasets.
Significantly improves the quality and speed of image-to-3D generation.
Effectively utilizes initial point clouds to facilitate Gaussian parameter prediction.
Abstract
Recently, image-to-3D approaches have significantly advanced the generation quality and speed of 3D assets based on large reconstruction models, particularly 3D Gaussian reconstruction models. Existing large 3D Gaussian models directly map 2D image to 3D Gaussian parameters, while regressing 2D image to 3D Gaussian representations is challenging without 3D priors. In this paper, we propose a large Point-to-Gaussian model, that inputs the initial point cloud produced from large 3D diffusion model conditional on 2D image to generate the Gaussian parameters, for image-to-3D generation. The point cloud provides initial 3D geometry prior for Gaussian generation, thus significantly facilitating image-to-3D Generation. Moreover, we present the \textbf{A}ttention mechanism, \textbf{P}rojection mechanism, and \textbf{P}oint feature extractor, dubbed as \textbf{APP} block, for fusing the image…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRemote Sensing and LiDAR Applications · Advanced Vision and Imaging · Computer Graphics and Visualization Techniques
MethodsDiffusion · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
