Fusing Visual Appearance and Geometry for Multi-modality 6DoF Object Tracking
Manuel Stoiber, Mariam Elsayed, Anne E. Reichert, Florian Steidle,, Dongheui Lee, Rudolph Triebel

TL;DR
This paper introduces a multi-modality 6DoF object tracking method that combines visual appearance and geometric information, achieving high accuracy and efficiency in robotic manipulation tasks.
Contribution
The work extends previous geometry-based tracking by integrating visual appearance features, including local keypoints and global region analysis, for improved pose estimation.
Findings
Outperforms existing methods on YCB-Video and OPT datasets.
Runs at over 300 Hz, demonstrating high efficiency.
Effectively combines appearance and geometry for robust tracking.
Abstract
In many applications of advanced robotic manipulation, six degrees of freedom (6DoF) object pose estimates are continuously required. In this work, we develop a multi-modality tracker that fuses information from visual appearance and geometry to estimate object poses. The algorithm extends our previous method ICG, which uses geometry, to additionally consider surface appearance. In general, object surfaces contain local characteristics from text, graphics, and patterns, as well as global differences from distinct materials and colors. To incorporate this visual information, two modalities are developed. For local characteristics, keypoint features are used to minimize distances between points from keyframes and the current image. For global differences, a novel region approach is developed that considers multiple regions on the object surface. In addition, it allows the modeling of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Human Pose and Action Recognition · Face recognition and analysis
MethodsOPT
