PoseStreamer: A Multi-modal Framework for 3D Tracking of Unseen Moving Objects
Huiming Yang, Linglin Liao, Fei Ding, Sibo Wang, Zijian Zeng

TL;DR
PoseStreamer is a multi-modal framework that significantly improves 6DoF pose estimation for unseen, fast-moving objects in challenging scenarios by integrating temporal, 2D, and geometric cues.
Contribution
It introduces PoseStreamer, a novel multi-modal approach with components for temporal consistency, 2D priors, and geometric refinement, along with a new dataset for high-speed object tracking.
Findings
Achieves superior accuracy in high-speed scenarios
Demonstrates strong generalizability to unseen objects
Outperforms existing methods in rapid motion conditions
Abstract
Six degree of freedom (6DoF) pose estimation for novel objects is a critical task in computer vision, yet it faces significant challenges in high-speed and low-light scenarios where standard RGB cameras suffer from motion blur. While event cameras offer a promising solution due to their high temporal resolution, current 6DoF pose estimation methods typically yield suboptimal performance in high-speed object moving scenarios. To address this gap, we propose PoseStreamer, a robust multi-modal 6DoF pose estimation framework designed specifically on high-speed moving scenarios. Our approach integrates three core components: an Adaptive Pose Memory Queue that utilizes historical orientation cues for temporal consistency, an Object-centric 2D Tracker that provides strong 2D priors to boost 3D center recall, and a Ray Pose Filter for geometric refinement along camera rays. Furthermore, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Robotics and Sensor-Based Localization · Human Pose and Action Recognition
