FG-CLTP: Fine-Grained Contrastive Language Tactile Pretraining for Robotic Manipulation
Wenxuan Ma, Chaofan Zhang, Yinghao Cai, Guocai Yao, Shaowei Cui, Shuo Wang

TL;DR
This paper introduces FG-CLTP, a novel framework for fine-grained tactile language pretraining in robotics, utilizing a large dataset and numerical tokenization to improve manipulation accuracy and generalization.
Contribution
The paper presents a new dataset, a discretized numerical tokenization method, and a 3D tactile-language-action architecture for enhanced robotic manipulation.
Findings
Achieved 95.9% classification accuracy
Reduced regression MAE by 52.6%
Minimal sim-to-real gap of 3.5%
Abstract
Recent advancements in integrating tactile sensing into vision-language-action (VLA) models have demonstrated transformative potential for robotic perception. However, existing tactile representations predominantly rely on qualitative descriptors (e.g., texture), neglecting quantitative contact states such as force magnitude, contact geometry, and principal axis orientation, which are indispensable for fine-grained manipulation. To bridge this gap, we propose FG-CLTP, a fine-grained contrastive language tactile pretraining framework. We first introduce a novel dataset comprising over 100k tactile 3D point cloud-language pairs that explicitly capture multidimensional contact states from the sensor's perspective. We then implement a discretized numerical tokenization mechanism to achieve quantitative-semantic alignment, effectively injecting explicit physical metrics into the multimodal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Sensor and Energy Harvesting Materials · Robot Manipulation and Learning · Tactile and Sensory Interactions
