Learning When to See and When to Feel: Adaptive Vision-Torque Fusion for Contact-Aware Manipulation

Jiuzhou Lei; Chang Liu; Yu She; Xiao Liang; Minghui Zheng

arXiv:2604.01414·cs.RO·April 3, 2026

Learning When to See and When to Feel: Adaptive Vision-Torque Fusion for Contact-Aware Manipulation

Jiuzhou Lei, Chang Liu, Yu She, Xiao Liang, Minghui Zheng

PDF

TL;DR

This paper compares various strategies for integrating vision and force/torque signals in robotic manipulation and introduces an adaptive fusion method that improves success rates by selectively using F/T data.

Contribution

It provides a comprehensive comparison of existing F/T-vision integration methods and proposes a novel adaptive strategy that enhances manipulation performance.

Findings

01

The adaptive fusion method outperforms baseline approaches by 14% in success rate.

02

Contact-aware multimodal fusion significantly improves manipulation success.

03

The study highlights the importance of context-dependent sensor integration.

Abstract

Vision-based policies have achieved a good performance in robotic manipulation due to the accessibility and richness of visual observations. However, purely visual sensing becomes insufficient in contact-rich and force-sensitive tasks where force/torque (F/T) signals provide critical information about contact dynamics, alignment, and interaction quality. Although various strategies have been proposed to integrate vision and F/T signals, including auxiliary prediction objectives, mixture-of-experts architectures, and contact-aware gating mechanisms, a comparison of these approaches remains lacking. In this work, we provide a comparison study of different F/T-vision integration strategies within diffusion-based manipulation policies. In addition, we propose an adaptive integration strategy that ignores F/T signals during non-contact phases while adaptively leveraging both vision and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.