PoseStreamer: A Multi-modal Framework for 3D Tracking of Unseen Moving Objects

Huiming Yang; Linglin Liao; Fei Ding; Sibo Wang; Zijian Zeng

arXiv:2512.22979·cs.CV·January 5, 2026

PoseStreamer: A Multi-modal Framework for 3D Tracking of Unseen Moving Objects

Huiming Yang, Linglin Liao, Fei Ding, Sibo Wang, Zijian Zeng

PDF

Open Access

TL;DR

PoseStreamer is a multi-modal framework that significantly improves 6DoF pose estimation for unseen, fast-moving objects in challenging scenarios by integrating temporal, 2D, and geometric cues.

Contribution

It introduces PoseStreamer, a novel multi-modal approach with components for temporal consistency, 2D priors, and geometric refinement, along with a new dataset for high-speed object tracking.

Findings

01

Achieves superior accuracy in high-speed scenarios

02

Demonstrates strong generalizability to unseen objects

03

Outperforms existing methods in rapid motion conditions

Abstract

Six degree of freedom (6DoF) pose estimation for novel objects is a critical task in computer vision, yet it faces significant challenges in high-speed and low-light scenarios where standard RGB cameras suffer from motion blur. While event cameras offer a promising solution due to their high temporal resolution, current 6DoF pose estimation methods typically yield suboptimal performance in high-speed object moving scenarios. To address this gap, we propose PoseStreamer, a robust multi-modal 6DoF pose estimation framework designed specifically on high-speed moving scenarios. Our approach integrates three core components: an Adaptive Pose Memory Queue that utilizes historical orientation cues for temporal consistency, an Object-centric 2D Tracker that provides strong 2D priors to boost 3D center recall, and a Ray Pose Filter for geometric refinement along camera rays. Furthermore, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Robotics and Sensor-Based Localization · Human Pose and Action Recognition