How to Train your Tactile Model: Tactile Perception with Multi-fingered Robot Hands
Christopher J. Ford, Kaichen Shi, Laura Butcher, Nathan F. Lepora, Efi Psomopoulou

TL;DR
This paper introduces TacViT, a Vision Transformer-based tactile perception model that generalizes across different sensors, reducing data needs and retraining for robotic tactile sensing.
Contribution
The paper presents TacViT, a novel transformer-based model that improves generalization in tactile perception for multi-fingered robots, outperforming CNNs on unseen sensors.
Findings
TacViT achieves superior generalization on new tactile sensors.
It reduces the need for extensive sensor-specific data collection.
TacViT outperforms CNN-based methods in contact property inference.
Abstract
Rapid deployment of new tactile sensors is essential for scalable robotic manipulation, especially in multi-fingered hands equipped with vision-based tactile sensors. However, current methods for inferring contact properties rely heavily on convolutional neural networks (CNNs), which, while effective on known sensors, require large, sensor-specific datasets. Furthermore, they require retraining for each new sensor due to differences in lens properties, illumination, and sensor wear. Here we introduce TacViT, a novel tactile perception model based on Vision Transformers, designed to generalize on new sensor data. TacViT leverages global self-attention mechanisms to extract robust features from tactile images, enabling accurate contact property inference even on previously unseen sensors. This capability significantly reduces the need for data collection and retraining, accelerating the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
