Learning When to See and When to Feel: Adaptive Vision-Torque Fusion for Contact-Aware Manipulation
Jiuzhou Lei, Chang Liu, Yu She, Xiao Liang, Minghui Zheng

TL;DR
This paper compares various strategies for integrating vision and force/torque signals in robotic manipulation and introduces an adaptive fusion method that improves success rates by selectively using F/T data.
Contribution
It provides a comprehensive comparison of existing F/T-vision integration methods and proposes a novel adaptive strategy that enhances manipulation performance.
Findings
The adaptive fusion method outperforms baseline approaches by 14% in success rate.
Contact-aware multimodal fusion significantly improves manipulation success.
The study highlights the importance of context-dependent sensor integration.
Abstract
Vision-based policies have achieved a good performance in robotic manipulation due to the accessibility and richness of visual observations. However, purely visual sensing becomes insufficient in contact-rich and force-sensitive tasks where force/torque (F/T) signals provide critical information about contact dynamics, alignment, and interaction quality. Although various strategies have been proposed to integrate vision and F/T signals, including auxiliary prediction objectives, mixture-of-experts architectures, and contact-aware gating mechanisms, a comparison of these approaches remains lacking. In this work, we provide a comparison study of different F/T-vision integration strategies within diffusion-based manipulation policies. In addition, we propose an adaptive integration strategy that ignores F/T signals during non-contact phases while adaptively leveraging both vision and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
