V-HOP: Visuo-Haptic 6D Object Pose Tracking
Hongyu Li, Mingxi Jia, Tuluhan Akbulut, Yu Xiang, George Konidaris, Srinath Sridhar

TL;DR
V-HOP introduces a transformer-based visuo-haptic object pose tracker that effectively combines visual and tactile data, achieving superior robustness and generalization in real-world manipulation tasks compared to existing methods.
Contribution
The paper presents a novel unified haptic representation and a transformer-based visuo-haptic tracking framework that improves pose estimation across diverse sensors and embodiments.
Findings
Significant performance improvements on challenging sequences.
Enhanced generalization across different sensors and objects.
Outperforms state-of-the-art visual trackers in real-world experiments.
Abstract
Humans naturally integrate vision and haptics for robust object perception during manipulation. The loss of either modality significantly degrades performance. Inspired by this multisensory integration, prior object pose estimation research has attempted to combine visual and haptic/tactile feedback. Although these works demonstrate improvements in controlled environments or synthetic datasets, they often underperform vision-only approaches in real-world settings due to poor generalization across diverse grippers, sensor layouts, or sim-to-real environments. Furthermore, they typically estimate the object pose for each frame independently, resulting in less coherent tracking over sequences in real-world deployments. To address these limitations, we introduce a novel unified haptic representation that effectively handles multiple gripper embodiments. Building on this representation, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
