TL;DR
TouchAnything uses a pretrained visual diffusion model as a prior to accurately reconstruct 3D object geometry from sparse tactile touches, outperforming existing methods and enabling open-world reconstruction.
Contribution
It introduces a novel approach that transfers knowledge from visual diffusion models to tactile-based 3D reconstruction, bypassing the need for training tactile-specific models.
Findings
Reconstructs accurate 3D geometries from few touches.
Outperforms existing baseline methods.
Enables open-world reconstruction of unseen objects.
Abstract
Accurate object geometry estimation is essential for many downstream tasks, including robotic manipulation and physical interaction. Although vision is the dominant modality for shape perception, it becomes unreliable under occlusions or challenging lighting conditions. In such scenarios, tactile sensing provides direct geometric information through physical contact. However, reconstructing global 3D geometry from sparse local touches alone is fundamentally underconstrained. We present TouchAnything, a framework that leverages a pretrained large-scale 2D vision diffusion model as a semantic and geometric prior for 3D reconstruction from sparse tactile measurements. Unlike prior work that trains category-specific reconstruction networks or learns diffusion models directly from tactile data, we transfer the geometric knowledge encoded in pretrained visual diffusion models to the tactile…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
