Sensor-Invariant Tactile Representation
Harsh Gupta, Yuchen Mo, Shengmiao Jin, Wenzhen Yuan

TL;DR
This paper presents a transformer-based method for creating sensor-invariant tactile representations, enabling zero-shot transfer across different optical tactile sensors and improving generalization in robotic tactile perception.
Contribution
It introduces a novel approach using transformers trained on simulated data to achieve sensor-invariance, addressing a key transferability challenge in tactile sensing.
Findings
Effective zero-shot transfer across diverse tactile sensors
Generalizes well from simulation to real-world sensors
Facilitates data and model transferability in tactile applications
Abstract
High-resolution tactile sensors have become critical for embodied perception and robotic manipulation. However, a key challenge in the field is the lack of transferability between sensors due to design and manufacturing variations, which result in significant differences in tactile signals. This limitation hinders the ability to transfer models or knowledge learned from one sensor to another. To address this, we introduce a novel method for extracting Sensor-Invariant Tactile Representations (SITR), enabling zero-shot transfer across optical tactile sensors. Our approach utilizes a transformer-based architecture trained on a diverse dataset of simulated sensor designs, allowing it to generalize to new sensors in the real world with minimal calibration. Experimental results demonstrate the method's effectiveness across various tactile sensing applications, facilitating data and model…
Peer Reviews
Decision·ICLR 2025 Poster
The idea of using simple calibration images is interesting and inspiring. I believe this design should become a standard technique in future tactile representation learning. The performance of the learned representation is quite good. On the three downstream tasks they evaluate, it significantly outperforms baselines and other methods in this area. It also presents comprehensive ablation experiments on the role of calibration images and contrastive learning losses. The paper provides nice vis
I believe there should be another important ablation experiment: training UniT/T3 on the simulated images collected in this paper. This is because there are two major differences between previous state-of-the-art methods (UniT/T3) and the proposed method. The first difference is the specific architectural design, such as contrastive learning, the transformer, and the use of calibration images. The second is the use of simulation data versus real-world data. This paper already presents experiment
1. Sensor variance, which is the problem this paper tries to solve, is very important in the field of vision-based tactile sensing. Recently, different methods [1, 2, 3] have been proposed to address this problem, and this work proposes a model that is complementary to the prior works. 2. The framework proposed by this paper does not require any sensor-specific design in the model architecture, which seems to be its major contribution. Instead of designing different encoders or tokens for speci
1. The dataset the model is trained on all comes from simulation. The tactile simulator uses 3D meshes of the object to synthesize RGB tactile images, which assumes that all objects are rigid and without micro-geometry on the object surface (e.g., wood grain). Although this might be sufficient for geometry-based tasks such as shape reconstruction and object classification, the features extracted from this model may lack the ability to classify the fine-grained details (e.g., smoothness, hardness
1. Originality and Innovation: The integration of a transformer-based architecture with supervised contrastive learning to achieve sensor-invariant tactile representation is a novel approach within the tactile sensing field. The method is original and demonstrates the potential for addressing the long-standing issue of sensor-specific dependency in tactile applications. 2. Technical Rigor and Methodological Soundness: The paper presents a technically robust framework, comprising a two-stage proc
1. Limited Generalizability: The approach is specifically tailored to GelSight sensors, limiting its generalizability to other types of tactile sensors. GelSight sensors are vision-based, and the proposed method may not be easily transferable to non-vision-based tactile sensors, such as capacitive or resistive types. This restriction diminishes the broader applicability of the findings, potentially limiting its impact within the ICLR community. 2. Practical Constraints of GelSight Sensors: GelSi
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Sensor and Energy Harvesting Materials · Tactile and Sensory Interactions · Neural Networks and Reservoir Computing
