RGB2Point: 3D Point Cloud Generation from Single RGB Images
Jae Joong Lee, Bedrich Benes

TL;DR
RGB2Point is a Transformer-based method that efficiently generates high-quality 3D point clouds from single RGB images, outperforming prior CNN and diffusion models in accuracy, consistency, and speed.
Contribution
This work introduces a novel Transformer-based approach for single-image 3D point cloud generation, achieving superior quality and efficiency over existing CNN and diffusion methods.
Findings
Achieves 51.15% improvement in Chamfer distance on real-world data.
Produces 63.1% more consistent results across categories.
Generates results 15,133x faster than state-of-the-art diffusion models.
Abstract
We introduce RGB2Point, an unposed single-view RGB image to a 3D point cloud generation based on Transformer. RGB2Point takes an input image of an object and generates a dense 3D point cloud. Contrary to prior works based on CNN layers and diffusion denoising approaches, we use pre-trained Transformer layers that are fast and generate high-quality point clouds with consistent quality over available categories. Our generated point clouds demonstrate high quality on a real-world dataset, as evidenced by improved Chamfer distance (51.15%) and Earth Mover's distance (45.96%) metrics compared to the current state-of-the-art. Additionally, our approach shows a better quality on a synthetic dataset, achieving better Chamfer distance (39.26%), Earth Mover's distance (26.95%), and F-score (47.16%). Moreover, our method produces 63.1% more consistent high-quality results across various object…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Surveying and Cultural Heritage · Remote Sensing and LiDAR Applications · 3D Shape Modeling and Analysis
MethodsAttention Is All You Need · Byte Pair Encoding · Layer Normalization · Label Smoothing · Linear Layer · Diffusion · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Multi-Head Attention
