GS-Pose: Generalizable Segmentation-based 6D Object Pose Estimation with 3D Gaussian Splatting
Dingding Cai, Janne Heikkil\"a, Esa Rahtu

TL;DR
GS-Pose is a versatile framework for 6D object pose estimation that combines retrieval and differentiable rendering with 3D Gaussian splatting, achieving state-of-the-art results on standard datasets.
Contribution
It introduces a novel unified approach using multiple object representations and 3D Gaussian splatting for efficient and accurate pose estimation of unseen objects.
Findings
Achieves state-of-the-art accuracy on LINEMOD and OnePose-LowTexture datasets.
Utilizes 3D Gaussian splatting for fast and effective pose refinement.
Supports easy addition of new objects using commodity hardware.
Abstract
This paper introduces GS-Pose, a unified framework for localizing and estimating the 6D pose of novel objects. GS-Pose begins with a set of posed RGB images of a previously unseen object and builds three distinct representations stored in a database. At inference, GS-Pose operates sequentially by locating the object in the input image, estimating its initial 6D pose using a retrieval approach, and refining the pose with a render-and-compare method. The key insight is the application of the appropriate object representation at each stage of the process. In particular, for the refinement step, we leverage 3D Gaussian splatting, a novel differentiable rendering technique that offers high rendering speed and relatively low optimization time. Off-the-shelf toolchains and commodity hardware, such as mobile phones, can be used to capture new objects to be added to the database. Extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Hand Gesture Recognition Systems · Robotic Mechanisms and Dynamics
MethodsSparse Evolutionary Training · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
