ControlTac: Force- and Position-Controlled Tactile Data Augmentation with a Single Reference Image
Dongyu Luo, Kelin Yu, Amir-Hossein Shahidzadeh, Cornelia Ferm\"uller, Yiannis Aloimonos, Ruohan Gao

TL;DR
ControlTac is a novel framework that generates realistic, varied tactile images from a single reference image using contact force and position, enhancing tactile data augmentation for robotic applications.
Contribution
It introduces a two-stage controllable method for generating tactile images conditioned on physical priors, improving data realism and transferability.
Findings
Effective augmentation of tactile datasets demonstrated in three downstream tasks.
Generated tactile images are physically plausible and diverse.
Real-world experiments confirm practical utility.
Abstract
Vision-based tactile sensing has been widely used in perception, reconstruction, and robotic manipulation. However, collecting large-scale tactile data remains costly due to the localized nature of sensor-object interactions and inconsistencies across sensor instances. Existing approaches to scaling tactile data, such as simulation and free-form tactile generation, often suffer from unrealistic output and poor transferability to downstream tasks. To address this, we propose ControlTac, a two-stage controllable framework that generates realistic tactile images conditioned on a single reference tactile image, contact force, and contact position. With those physical priors as control input, ControlTac generates physically plausible and varied tactile images that can be used for effective data augmentation. Through experiments on three downstream tasks, we demonstrate that ControlTac can…
Peer Reviews
Decision·Submitted to ICLR 2026
1. It is true that the lack of high-quality tactile images is a bottleneck to the community. A reliable augmentation method will benefit the tactile research community. 2. The authors designed and conducted extensive downstream experiments, providing qualitative and quantitative results to show the effectiveness of their method.
1. Fig 1 is not clear. From the context, I infer the "Data Augmentation" block is saying "generate tactile images for the same object with different 3D force and contact pose", but this invariance is not mentioned here or in the related context. 2. Regarding Fig 2 and related discussion: Text2Tac and Vis2Tac are obviously NOT tactile data augmentation methods. These two are not giving accurate tactile signals, but they try to align some tactile properties with other modalities in a generative mo
+ A quite interesting application of DiT+ControlNet to some under-explored touch. The way the authors designed the conditioning signals for their model makes a lot of sense in the context of the work and what they are trying to achieve. + According to Tab. 1 the method can recreate significantly more faithfully images from a tactile sensor for seen objects wrt to a separate simulator. + I appreciated that the authors took their proposal for a real world test including experiments with a real r
## Major A. **Experiments in Sec. 4.1 are all in-domain:** Per my understanding the results in Sec. 4.1. Cover only “in domain” experiments, meaning generations of images for known objects belonging to the same dataset used to train ControlTac. This puts the method and the ablations at an unfair advantage against Taxim that has not been fine tuned for that specific category of objects. The gap is big enough that the proposed method might still be better, but I would have expected a generalizat
1. The paper introduces a controllable and physically grounded tactile data generation method, enabling fine-grained control and physical plausibility. 2. ControlTac can generate thousands of realistic tactile images from just one reference image. 3. This work conducts extensive experiments to validate the effectiveness of ControlTac, including real-world experiments. 4. Compared to simulation-based and free-form generative methods (e.g., Text2Tac, Vis2Tac), ControlTac produces more realistic an
There are several main issues in this paper that remain unaddressed: 1. In the downstream tasks, is ControlTac further fine-tuned, or does it directly use the FeelAnyForce pre-trained model in a zero-shot manner? If it is the latter, can a model trained on only 20,000 frames from FeelAnyForce truly support generalization to a wider range of more complex objects in more open environments? In the failure cases shown in the appendix, the model performs worse on objects with flat surfaces, rich tex
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Sensor and Energy Harvesting Materials · Tactile and Sensory Interactions · Soft Robotics and Applications
