Sensor-Invariant Tactile Representation

Harsh Gupta; Yuchen Mo; Shengmiao Jin; Wenzhen Yuan

arXiv:2502.19638·cs.RO·March 14, 2025

Sensor-Invariant Tactile Representation

Harsh Gupta, Yuchen Mo, Shengmiao Jin, Wenzhen Yuan

PDF

Open Access 1 Datasets 3 Reviews

TL;DR

This paper presents a transformer-based method for creating sensor-invariant tactile representations, enabling zero-shot transfer across different optical tactile sensors and improving generalization in robotic tactile perception.

Contribution

It introduces a novel approach using transformers trained on simulated data to achieve sensor-invariance, addressing a key transferability challenge in tactile sensing.

Findings

01

Effective zero-shot transfer across diverse tactile sensors

02

Generalizes well from simulation to real-world sensors

03

Facilitates data and model transferability in tactile applications

Abstract

High-resolution tactile sensors have become critical for embodied perception and robotic manipulation. However, a key challenge in the field is the lack of transferability between sensors due to design and manufacturing variations, which result in significant differences in tactile signals. This limitation hinders the ability to transfer models or knowledge learned from one sensor to another. To address this, we introduce a novel method for extracting Sensor-Invariant Tactile Representations (SITR), enabling zero-shot transfer across optical tactile sensors. Our approach utilizes a transformer-based architecture trained on a diverse dataset of simulated sensor designs, allowing it to generalize to new sensors in the real world with minimal calibration. Experimental results demonstrate the method's effectiveness across various tactile sensing applications, facilitating data and model…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 4

Strengths

The idea of using simple calibration images is interesting and inspiring. I believe this design should become a standard technique in future tactile representation learning. The performance of the learned representation is quite good. On the three downstream tasks they evaluate, it significantly outperforms baselines and other methods in this area. It also presents comprehensive ablation experiments on the role of calibration images and contrastive learning losses. The paper provides nice vis

Weaknesses

I believe there should be another important ablation experiment: training UniT/T3 on the simulated images collected in this paper. This is because there are two major differences between previous state-of-the-art methods (UniT/T3) and the proposed method. The first difference is the specific architectural design, such as contrastive learning, the transformer, and the use of calibration images. The second is the use of simulation data versus real-world data. This paper already presents experiment

Reviewer 02Rating 6Confidence 4

Strengths

1. Sensor variance, which is the problem this paper tries to solve, is very important in the field of vision-based tactile sensing. Recently, different methods [1, 2, 3] have been proposed to address this problem, and this work proposes a model that is complementary to the prior works. 2. The framework proposed by this paper does not require any sensor-specific design in the model architecture, which seems to be its major contribution. Instead of designing different encoders or tokens for speci

Weaknesses

1. The dataset the model is trained on all comes from simulation. The tactile simulator uses 3D meshes of the object to synthesize RGB tactile images, which assumes that all objects are rigid and without micro-geometry on the object surface (e.g., wood grain). Although this might be sufficient for geometry-based tasks such as shape reconstruction and object classification, the features extracted from this model may lack the ability to classify the fine-grained details (e.g., smoothness, hardness

Reviewer 03Rating 6Confidence 5

Strengths

1. Originality and Innovation: The integration of a transformer-based architecture with supervised contrastive learning to achieve sensor-invariant tactile representation is a novel approach within the tactile sensing field. The method is original and demonstrates the potential for addressing the long-standing issue of sensor-specific dependency in tactile applications. 2. Technical Rigor and Methodological Soundness: The paper presents a technically robust framework, comprising a two-stage proc

Weaknesses

1. Limited Generalizability: The approach is specifically tailored to GelSight sensors, limiting its generalizability to other types of tactile sensors. GelSight sensors are vision-based, and the proposed method may not be easily transferable to non-vision-based tactile sensors, such as capacitive or resistive types. This restriction diminishes the broader applicability of the findings, potentially limiting its impact within the ICLR community. 2. Practical Constraints of GelSight Sensors: GelSi

Code & Models

Datasets

hgupt3/sitr_dataset
dataset· 200 dl
200 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Sensor and Energy Harvesting Materials · Tactile and Sensory Interactions · Neural Networks and Reservoir Computing