TextToucher: Fine-Grained Text-to-Touch Generation
Jiahang Tu, Hao Fu, Fengyu Yang, Hanbin Zhao, Chao Zhang, Hui Qian

TL;DR
TextToucher introduces a novel fine-grained text-to-touch generation framework that leverages multimodal language models and dual-grain text conditioning to produce high-quality tactile data, advancing multi-modal and embodied AI capabilities.
Contribution
The paper proposes a new method combining object-level and sensor-level text descriptions with diffusion transformers for detailed tactile image generation, along with a novel evaluation metric.
Findings
Outperforms existing tactile generation methods
Produces high-quality, detailed tactile samples
Demonstrates effectiveness through extensive experiments
Abstract
Tactile sensation plays a crucial role in the development of multi-modal large models and embodied intelligence. To collect tactile data with minimal cost as possible, a series of studies have attempted to generate tactile images by vision-to-touch image translation. However, compared to text modality, visual modality-driven tactile generation cannot accurately depict human tactile sensation. In this work, we analyze the characteristics of tactile images in detail from two granularities: object-level (tactile texture, tactile shape), and sensor-level (gel status). We model these granularities of information through text descriptions and propose a fine-grained Text-to-Touch generation method (TextToucher) to generate high-quality tactile samples. Specifically, we introduce a multimodal large language model to build the text sentences about object-level tactile information and employ a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSpeech and dialogue systems · Human Motion and Animation · Interactive and Immersive Displays
MethodsSparse Evolutionary Training · Diffusion
