Generating Visual Scenes from Touch
Fengyu Yang, Jiacheng Zhang, Andrew Owens

TL;DR
This paper introduces a latent diffusion-based model that generates images from tactile signals, advancing visuo-tactile synthesis by outperforming prior methods and addressing new synthesis challenges.
Contribution
We develop the first model capable of generating images solely from touch signals, significantly improving tactile-driven stylization and enabling novel visuo-tactile synthesis tasks.
Findings
Outperforms previous tactile-driven stylization methods
First to generate images from touch without extra scene info
Successfully addresses new synthesis problems like shading estimation
Abstract
An emerging line of work has sought to generate plausible imagery from touch. Existing approaches, however, tackle only narrow aspects of the visuo-tactile synthesis problem, and lag significantly behind the quality of cross-modal synthesis methods in other domains. We draw on recent advances in latent diffusion to create a model for synthesizing images from tactile signals (and vice versa) and apply it to a number of visuo-tactile synthesis tasks. Using this model, we significantly outperform prior work on the tactile-driven stylization problem, i.e., manipulating an image to match a touch signal, and we are the first to successfully generate images from touch without additional sources of information about the scene. We also successfully use our model to address two novel synthesis problems: generating images that do not contain the touch sensor or the hand holding it, and estimating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTactile and Sensory Interactions · Interactive and Immersive Displays · Virtual Reality Applications and Impacts
MethodsDiffusion
