RGB2Point: 3D Point Cloud Generation from Single RGB Images

Jae Joong Lee; Bedrich Benes

arXiv:2407.14979·cs.CV·December 6, 2024

RGB2Point: 3D Point Cloud Generation from Single RGB Images

Jae Joong Lee, Bedrich Benes

PDF

Open Access

TL;DR

RGB2Point is a Transformer-based method that efficiently generates high-quality 3D point clouds from single RGB images, outperforming prior CNN and diffusion models in accuracy, consistency, and speed.

Contribution

This work introduces a novel Transformer-based approach for single-image 3D point cloud generation, achieving superior quality and efficiency over existing CNN and diffusion methods.

Findings

01

Achieves 51.15% improvement in Chamfer distance on real-world data.

02

Produces 63.1% more consistent results across categories.

03

Generates results 15,133x faster than state-of-the-art diffusion models.

Abstract

We introduce RGB2Point, an unposed single-view RGB image to a 3D point cloud generation based on Transformer. RGB2Point takes an input image of an object and generates a dense 3D point cloud. Contrary to prior works based on CNN layers and diffusion denoising approaches, we use pre-trained Transformer layers that are fast and generate high-quality point clouds with consistent quality over available categories. Our generated point clouds demonstrate high quality on a real-world dataset, as evidenced by improved Chamfer distance (51.15%) and Earth Mover's distance (45.96%) metrics compared to the current state-of-the-art. Additionally, our approach shows a better quality on a synthetic dataset, achieving better Chamfer distance (39.26%), Earth Mover's distance (26.95%), and F-score (47.16%). Moreover, our method produces 63.1% more consistent high-quality results across various object…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Surveying and Cultural Heritage · Remote Sensing and LiDAR Applications · 3D Shape Modeling and Analysis

MethodsAttention Is All You Need · Byte Pair Encoding · Layer Normalization · Label Smoothing · Linear Layer · Diffusion · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Multi-Head Attention