Large Point-to-Gaussian Model for Image-to-3D Generation

Longfei Lu; Huachen Gao; Tao Dai; Yaohua Zha; Zhi Hou; Junta Wu,; Shu-Tao Xia

arXiv:2408.10935·cs.CV·August 21, 2024

Large Point-to-Gaussian Model for Image-to-3D Generation

Longfei Lu, Huachen Gao, Tao Dai, Yaohua Zha, Zhi Hou, Junta Wu,, Shu-Tao Xia

PDF

Open Access

TL;DR

This paper introduces a novel Point-to-Gaussian model that leverages initial point clouds from 3D diffusion models to improve image-to-3D asset generation, achieving state-of-the-art results.

Contribution

The paper proposes a new Point-to-Gaussian approach with an APP block for better image-to-3D generation, integrating point cloud priors and advanced feature fusion techniques.

Findings

01

Achieves state-of-the-art performance on GSO and Objaverse datasets.

02

Significantly improves the quality and speed of image-to-3D generation.

03

Effectively utilizes initial point clouds to facilitate Gaussian parameter prediction.

Abstract

Recently, image-to-3D approaches have significantly advanced the generation quality and speed of 3D assets based on large reconstruction models, particularly 3D Gaussian reconstruction models. Existing large 3D Gaussian models directly map 2D image to 3D Gaussian parameters, while regressing 2D image to 3D Gaussian representations is challenging without 3D priors. In this paper, we propose a large Point-to-Gaussian model, that inputs the initial point cloud produced from large 3D diffusion model conditional on 2D image to generate the Gaussian parameters, for image-to-3D generation. The point cloud provides initial 3D geometry prior for Gaussian generation, thus significantly facilitating image-to-3D Generation. Moreover, we present the \textbf{A}ttention mechanism, \textbf{P}rojection mechanism, and \textbf{P}oint feature extractor, dubbed as \textbf{APP} block, for fusing the image…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRemote Sensing and LiDAR Applications · Advanced Vision and Imaging · Computer Graphics and Visualization Techniques

MethodsDiffusion · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings