RGBTrack: Fast, Robust Depth-Free 6D Pose Estimation and Tracking
Teng Guo, Jingjin Yu

TL;DR
RGBTrack is a real-time, depth-free 6D pose estimation and tracking framework that uses only RGB data, combining novel search and render techniques with robust tracking modules for dynamic scenarios.
Contribution
It introduces a novel depth inference method and a scale recovery module within a real-time RGB-only 6D pose tracking framework, enhancing robustness and practicality.
Findings
Achieves competitive accuracy on benchmark datasets.
Operates in real-time without depth sensors.
Maintains stable tracking during rapid movements and occlusions.
Abstract
We introduce a robust framework, RGBTrack, for real-time 6D pose estimation and tracking that operates solely on RGB data, thereby eliminating the need for depth input for such dynamic and precise object pose tracking tasks. Building on the FoundationPose architecture, we devise a novel binary search strategy combined with a render-and-compare mechanism to efficiently infer depth and generate robust pose hypotheses from true-scale CAD models. To maintain stable tracking in dynamic scenarios, including rapid movements and occlusions, RGBTrack integrates state-of-the-art 2D object tracking (XMem) with a Kalman filter and a state machine for proactive object pose recovery. In addition, RGBTrack's scale recovery module dynamically adapts CAD models of unknown scale using an initial depth estimate, enabling seamless integration with modern generative reconstruction techniques. Extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Human Pose and Action Recognition · Robotics and Sensor-Based Localization
