SARL: Spatially-Aware Self-Supervised Representation Learning for Visuo-Tactile Perception
Gurmeher Khurana, Lan Wei, Dandan Zhang

TL;DR
SARL introduces a spatially-aware self-supervised learning framework that enhances visuo-tactile perception by maintaining spatial structure in feature representations, significantly improving robotic manipulation tasks involving geometry and texture understanding.
Contribution
This work presents SARL, a novel SSL framework that incorporates map-level objectives to preserve spatial information in fused visual-tactile data, outperforming existing methods in manipulation tasks.
Findings
SARL achieves a 30% reduction in MAE on edge-pose regression.
SARL outperforms nine SSL baselines across six downstream tasks.
Structured spatial equivariance is key for effective visuo-tactile perception.
Abstract
Contact-rich robotic manipulation requires representations that encode local geometry. Vision provides global context but lacks direct measurements of properties such as texture and hardness, whereas touch supplies these cues. Modern visuo-tactile sensors capture both modalities in a single fused image, yielding intrinsically aligned inputs that are well suited to manipulation tasks requiring visual and tactile information. Most self-supervised learning (SSL) frameworks, however, compress feature maps into a global vector, discarding spatial structure and misaligning with the needs of manipulation. To address this, we propose SARL, a spatially-aware SSL framework that augments the Bootstrap Your Own Latent (BYOL) architecture with three map-level objectives, including Saliency Alignment (SAL), Patch-Prototype Distribution Alignment (PPDA), and Region Affinity Matching (RAM), to keep…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTactile and Sensory Interactions · Robot Manipulation and Learning · Advanced Sensor and Energy Harvesting Materials
