GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting
Kai Zhang, Sai Bi, Hao Tan, Yuanbo Xiangli, Nanxuan Zhao, Kalyan, Sunkavalli, Zexiang Xu

TL;DR
GS-LRM is a scalable transformer-based model that efficiently reconstructs high-quality 3D Gaussian primitives from sparse images, outperforming previous methods in object and scene modeling tasks.
Contribution
Introduces GS-LRM, a simple yet effective transformer-based large reconstruction model capable of handling diverse scenes from sparse images.
Findings
Reconstructs 3D scenes in 0.23 seconds on a single GPU.
Outperforms state-of-the-art baselines on object and scene datasets.
Handles large variations in scale and complexity effectively.
Abstract
We propose GS-LRM, a scalable large reconstruction model that can predict high-quality 3D Gaussian primitives from 2-4 posed sparse images in 0.23 seconds on single A100 GPU. Our model features a very simple transformer-based architecture; we patchify input posed images, pass the concatenated multi-view image tokens through a sequence of transformer blocks, and decode final per-pixel Gaussian parameters directly from these tokens for differentiable rendering. In contrast to previous LRMs that can only reconstruct objects, by predicting per-pixel Gaussians, GS-LRM naturally handles scenes with large variations in scale and complexity. We show that our model can work on both object and scene captures by training it on Objaverse and RealEstate10K respectively. In both scenarios, the models outperform state-of-the-art baselines by a wide margin. We also demonstrate applications of our model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Vision Systems and Defect Detection · Image Processing and 3D Reconstruction
