TL;DR
AT-VLA introduces an adaptive tactile injection and dual-stream mechanism to improve real-time tactile feedback and physical interaction in vision-language-action robotic models, addressing current limitations.
Contribution
The paper proposes AT-VLA with adaptive tactile injection and dual-stream processing, enabling real-time tactile responses without disrupting pretrained capabilities.
Findings
Effective in contact-rich manipulation tasks
Achieves real-time responses within 0.04 seconds
Validates improved tactile feedback integration
Abstract
Vision-Language-Action (VLA) models have significantly advanced the capabilities of robotic agents in executing diverse tasks; however, they still face challenges in contact-rich manipulation scenarios that require precise physical interactions. To address this limitation, recent studies have attempted to incorporate tactile signals during downstream tasks, enabling pretrained VLAs to interpret tactile feedback. Nevertheless, introducing new modalities during finetuning, which are rarely present in the pretrain stage, may disrupt the pretrained capabilities of VLAs. In addition, the inherently slow inference speed of VLAs hampers real-time responsiveness and limits the effective utilization of tactile feedback for action adjustment. To overcome these challenges, we propose Adaptive Tactile Vision-Language-Action (AT-VLA), which introduces a novel Adaptive Tactile Injection mechanism.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
