V-HOP: Visuo-Haptic 6D Object Pose Tracking

Hongyu Li; Mingxi Jia; Tuluhan Akbulut; Yu Xiang; George Konidaris; Srinath Sridhar

arXiv:2502.17434·cs.RO·September 12, 2025

V-HOP: Visuo-Haptic 6D Object Pose Tracking

Hongyu Li, Mingxi Jia, Tuluhan Akbulut, Yu Xiang, George Konidaris, Srinath Sridhar

PDF

TL;DR

V-HOP introduces a transformer-based visuo-haptic object pose tracker that effectively combines visual and tactile data, achieving superior robustness and generalization in real-world manipulation tasks compared to existing methods.

Contribution

The paper presents a novel unified haptic representation and a transformer-based visuo-haptic tracking framework that improves pose estimation across diverse sensors and embodiments.

Findings

01

Significant performance improvements on challenging sequences.

02

Enhanced generalization across different sensors and objects.

03

Outperforms state-of-the-art visual trackers in real-world experiments.

Abstract

Humans naturally integrate vision and haptics for robust object perception during manipulation. The loss of either modality significantly degrades performance. Inspired by this multisensory integration, prior object pose estimation research has attempted to combine visual and haptic/tactile feedback. Although these works demonstrate improvements in controlled environments or synthetic datasets, they often underperform vision-only approaches in real-world settings due to poor generalization across diverse grippers, sensor layouts, or sim-to-real environments. Furthermore, they typically estimate the object pose for each frame independently, resulting in less coherent tracking over sequences in real-world deployments. To address these limitations, we introduce a novel unified haptic representation that effectively handles multiple gripper embodiments. Building on this representation, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.