TouchGuide: Inference-Time Steering of Visuomotor Policies via Touch Guidance
Zhemeng Zhang, Jiahua Ma, Xincheng Yang, Xin Wen, Yuzhi Zhang, Boyan Li, Yiran Qin, Jin Liu, Can Zhao, Li Kang, Haoqin Hong, Zhenfei Yin, Philip Torr, Hao Su, Ruimao Zhang, Daolin Ma

TL;DR
TouchGuide is a novel framework that enhances visuomotor policies with tactile feedback at inference time, improving contact-rich manipulation tasks by refining actions through tactile guidance.
Contribution
It introduces a cross-policy visuo-tactile fusion paradigm with a tactile guidance mechanism and a cost-effective tactile data collection system, TacUMI.
Findings
TouchGuide outperforms state-of-the-art policies on five contact-rich tasks.
The tactile guidance refines actions to satisfy physical contact constraints.
TacUMI enables reliable tactile data collection with affordable hardware.
Abstract
Fine-grained and contact-rich manipulation remain challenging for robots, largely due to the underutilization of tactile feedback. To address this, we introduce TouchGuide, a novel cross-policy visuo-tactile fusion paradigm that fuses modalities within a low-dimensional action space. Specifically, TouchGuide operates in two stages to guide a pre-trained diffusion or flow-matching visuomotor policy at inference time. First, the policy produces a coarse, visually-plausible action using only visual inputs during early sampling. Second, a task-specific Contact Physical Model (CPM) provides tactile guidance to steer and refine the action, ensuring it aligns with realistic physical contact conditions. Trained through contrastive learning on limited expert demonstrations, the CPM provides a tactile-informed feasibility score to steer the sampling process toward refined actions that satisfy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
