VIPose: Real-time Visual-Inertial 6D Object Pose Tracking
Rundong Ge, Giuseppe Loianno

TL;DR
VIPose is a real-time deep learning method that fuses visual and inertial data to accurately track 6D object poses across frames, especially effective for occluded objects, validated on a new dataset.
Contribution
Introduces VIPose, a novel DNN architecture that combines visual and inertial data for real-time 6D object pose tracking, improving accuracy and robustness.
Findings
Achieves real-time 6D pose tracking with high accuracy.
Performs well on heavily occluded objects.
Comparable to state-of-the-art methods in accuracy.
Abstract
Estimating the 6D pose of objects is beneficial for robotics tasks such as transportation, autonomous navigation, manipulation as well as in scenarios beyond robotics like virtual and augmented reality. With respect to single image pose estimation, pose tracking takes into account the temporal information across multiple frames to overcome possible detection inconsistencies and to improve the pose estimation efficiency. In this work, we introduce a novel Deep Neural Network (DNN) called VIPose, that combines inertial and camera data to address the object pose tracking problem in real-time. The key contribution is the design of a novel DNN architecture which fuses visual and inertial features to predict the objects' relative 6D pose between consecutive image frames. The overall 6D pose is then estimated by consecutively combining relative poses. Our approach shows remarkable pose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Robot Manipulation and Learning · Advanced Vision and Imaging
