Investigating Vision Foundational Models for Tactile Representation Learning
Ben Zandonati, Ruohan Wang, Ruihan Gao, Yan Wu

TL;DR
This paper explores using vision foundational models for tactile representation learning in robotics, transforming tactile data into a visual format to improve task performance and generalisability across sensors and tasks.
Contribution
It introduces a sensor-agnostic transformation technique that recasts tactile learning as a computer vision problem, enabling the application of CV methods to tactile data.
Findings
Significant performance improvements on benchmark tasks.
Enhanced model robustness and transferability.
Effective cross-sensor and cross-task generalisation.
Abstract
Tactile representation learning (TRL) equips robots with the ability to leverage touch information, boosting performance in tasks such as environment perception and object manipulation. However, the heterogeneity of tactile sensors results in many sensor- and task-specific learning approaches. This limits the efficacy of existing tactile datasets, and the subsequent generalisability of any learning outcome. In this work, we investigate the applicability of vision foundational models to sensor-agnostic TRL, via a simple yet effective transformation technique to feed the heterogeneous sensor readouts into the model. Our approach recasts TRL as a computer vision (CV) problem, which permits the application of various CV techniques for tackling TRL-specific challenges. We evaluate our approach on multiple benchmark tasks, using datasets collected from four different tactile sensors.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTactile and Sensory Interactions · Advanced Sensor and Energy Harvesting Materials · EEG and Brain-Computer Interfaces
