TL;DR
This paper introduces a novel deep learning framework using Residue-Fusion GAN with feature-matching and perceptual losses to generate tactile data from visual inputs, enhancing cross-modal perception for robotics.
Contribution
It proposes a new Residue-Fusion GAN architecture with specific loss functions for effective visual-to-tactile data translation, advancing cross-modal data generation techniques.
Findings
Improved classification accuracy with generated tactile data.
Enhanced visual similarity between generated and real data.
Significant performance gains with RF, FM, and perceptual losses.
Abstract
Existing psychophysical studies have revealed that the cross-modal visual-tactile perception is common for humans performing daily activities. However, it is still challenging to build the algorithmic mapping from one modality space to another, namely the cross-modal visual-tactile data translation/generation, which could be potentially important for robotic operation. In this paper, we propose a deep-learning-based approach for cross-modal visual-tactile data generation by leveraging the framework of the generative adversarial networks (GANs). Our approach takes the visual image of a material surface as the visual data, and the accelerometer signal induced by the pen-sliding movement on the surface as the tactile data. We adopt the conditional-GAN (cGAN) structure together with the residue-fusion (RF) module, and train the model with the additional feature-matching (FM) and perceptual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
