KeyGS: A Keyframe-Centric Gaussian Splatting Method for Monocular Image Sequences
Keng-Wei Chang, Zi-Ming Wang, Shang-Hong Lai

TL;DR
This paper introduces KeyGS, a fast and accurate monocular 3D reconstruction method that refines camera poses using Gaussian Splatting, reducing training time from hours to minutes.
Contribution
It proposes a novel framework combining SfM and 3D Gaussian Splatting with a coarse-to-fine densification for efficient monocular 3D reconstruction without depth or matching models.
Findings
Reduces training time from hours to minutes.
Achieves more accurate novel view synthesis.
Improves camera pose estimation accuracy.
Abstract
Reconstructing high-quality 3D models from sparse 2D images has garnered significant attention in computer vision. Recently, 3D Gaussian Splatting (3DGS) has gained prominence due to its explicit representation with efficient training speed and real-time rendering capabilities. However, existing methods still heavily depend on accurate camera poses for reconstruction. Although some recent approaches attempt to train 3DGS models without the Structure-from-Motion (SfM) preprocessing from monocular video datasets, these methods suffer from prolonged training times, making them impractical for many applications. In this paper, we present an efficient framework that operates without any depth or matching model. Our approach initially uses SfM to quickly obtain rough camera poses within seconds, and then refines these poses by leveraging the dense representation in 3DGS. This framework…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Medical Image Segmentation Techniques · Remote-Sensing Image Classification
MethodsSoftmax · Attention Is All You Need · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
