Touch begins where vision ends: Generalizable policies for contact-rich manipulation

Zifan Zhao; Siddhant Haldar; Jinda Cui; Lerrel Pinto; Raunaq Bhirangi

arXiv:2506.13762·cs.RO·June 17, 2025

Touch begins where vision ends: Generalizable policies for contact-rich manipulation

Zifan Zhao, Siddhant Haldar, Jinda Cui, Lerrel Pinto, Raunaq Bhirangi

PDF

Open Access

TL;DR

This paper introduces ViTaL, a two-phase policy framework for contact-rich manipulation that combines vision-language models and tactile sensing to achieve high success rates in unseen environments.

Contribution

The paper presents a novel decomposed approach using foundation models and tactile sensing, enabling generalizable and robust contact-rich manipulation policies.

Findings

01

Achieves around 90% success in unseen environments

02

Robust to distractors and scene variations

03

Tactile sensing significantly improves performance

Abstract

Data-driven approaches struggle with precise manipulation; imitation learning requires many hard-to-obtain demonstrations, while reinforcement learning yields brittle, non-generalizable policies. We introduce VisuoTactile Local (ViTaL) policy learning, a framework that solves fine-grained manipulation tasks by decomposing them into two phases: a reaching phase, where a vision-language model (VLM) enables scene-level reasoning to localize the object of interest, and a local interaction phase, where a reusable, scene-agnostic ViTaL policy performs contact-rich manipulation using egocentric vision and tactile sensing. This approach is motivated by the observation that while scene context varies, the low-level interaction remains consistent across task instances. By training local policies once in a canonical setting, they can generalize via a localize-then-execute strategy. ViTaL achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTactile and Sensory Interactions