TaF-VLA: Tactile-Force Alignment in Vision-Language-Action Models for Force-aware Manipulation
Yuzhe Huang, Pei Lin, Wanlin Li, Daohan Li, Jiajun Li, Jiaming Jiang, Chenxi Xiao, Ziyuan Jiao

TL;DR
TaF-VLA introduces a novel framework that explicitly aligns tactile observations with physical forces in vision-language-action models, significantly improving force-aware robotic manipulation through a large-scale tactile-force dataset and specialized encoding mechanisms.
Contribution
The paper presents TaF-VLA, a new approach that shifts from tactile-vision to tactile-force alignment, including a large dataset and a tactile encoder for better physical reasoning in manipulation tasks.
Findings
Outperforms state-of-the-art baselines in contact-rich tasks
Demonstrates robust, force-aware manipulation capabilities
Validates the effectiveness of tactile-force alignment
Abstract
Vision-Language-Action (VLA) models have recently emerged as powerful generalists for robotic manipulation. However, due to their predominant reliance on visual modalities, they fundamentally lack the physical intuition required for contact-rich tasks that require precise force regulation and physical reasoning. Existing attempts to incorporate vision-based tactile sensing into VLA models typically treat tactile inputs as auxiliary visual textures, thereby overlooking the underlying correlation between surface deformation and interaction dynamics. To bridge this gap, we propose a paradigm shift from tactile-vision alignment to tactile-force alignment. Here, we introduce TaF-VLA, a framework that explicitly grounds high-dimensional tactile observations in physical interaction forces. To facilitate this, we develop an automated tactile-force data acquisition device and curate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Sensor and Energy Harvesting Materials · Tactile and Sensory Interactions · Robot Manipulation and Learning
