# Visual-to-Tactile Cross-Modal Generation Using a Class-Conditional GAN with Multi-Scale Discriminator and Hybrid Loss

**Authors:** Nikolay Neshov, Krasimir Tonchev, Agata Manolova, Radostina Petkova, Ivaylo Bozhilov

PMC · DOI: 10.3390/s26020426 · Sensors (Basel, Switzerland) · 2026-01-09

## TL;DR

This paper introduces a new GAN-based method to convert texture images into tactile feedback, improving haptic rendering and virtual reality applications.

## Contribution

A class-conditional GAN with a multi-scale discriminator and hybrid loss for visual-to-tactile translation is proposed and validated.

## Key findings

- The model achieves superior perceptual similarity to real spectrograms using LPIPS and FID metrics.
- Class conditioning via a DenseNet-201 label predictor enhances the quality of generated vibrotactile spectrograms.
- The hybrid loss combining adversarial, L1, and feature matching losses improves generator performance.

## Abstract

Understanding surface textures through visual cues is crucial for applications in haptic rendering and virtual reality. However, accurately translating visual information into tactile feedback remains a challenging problem. To address this challenge, this paper presents a class-conditional Generative Adversarial Network (cGAN) for cross-modal translation from texture images to vibrotactile spectrograms, using samples from the LMT-108 dataset. The generator is adapted from pix2pix and enhanced with Conditional Batch Normalization (CBN) at the bottleneck to incorporate texture class semantics. A dedicated label predictor, based on a DenseNet-201 and trained separately prior to cGAN training, provides the conditioning label. The discriminator is derived from pix2pixHD and uses a multi-scale architecture with three discriminators, each comprising three downsampling layers. A grid search over multi-scale discriminator configurations shows that this setup yields optimal perceptual similarity measured by Learned Perceptual Image Patch Similarity (LPIPS). The generator is trained using a hybrid loss that combines adversarial, L1, and feature matching losses derived from intermediate discriminator features, while the discriminators are trained using standard adversarial loss. Quantitative evaluation with LPIPS and Fréchet Inception Distance (FID) confirms superior similarity to real spectrograms. GradCAM visualizations highlight the benefit of class conditioning. The proposed model outperforms pix2pix, pix2pixHD, Residue-Fusion GAN, and several ablated versions. The generated spectrograms can be converted into vibrotactile signals using the Griffin–Lim algorithm, enabling applications in haptic feedback and virtual material simulation.

## Full-text entities

- **Chemicals:** GAN (MESH:C050366)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12845744/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12845744/full.md

## References

59 references — full list in the complete paper: https://tomesphere.com/paper/PMC12845744/full.md

---
Source: https://tomesphere.com/paper/PMC12845744