TL;DR
This paper introduces PDF, a test-time adaptation method for Vision-Language-Action models, enhancing robustness to environmental shifts without model fine-tuning, using uncertainty-based augmentation and delayed feedback correction.
Contribution
PDF is a novel verifier-free framework that improves VLA decision-making robustness through uncertainty-driven augmentation and a lightweight perturbation module, without requiring fine-tuning.
Findings
+7.4% success rate on LIBERO
+10.3 human normalized score on Atari
Consistent performance gains over baseline models
Abstract
Vision-Language-Action models (VLAs) achieve remarkable performance in sequential decision-making but remain fragile to subtle environmental shifts, such as small changes in object pose. We attribute this brittleness to trajectory overfitting, where VLAs over-attend to the spurious correlation between actions and entities, then reproduce memorized action patterns. We propose Perturbation learning with Delayed Feedback (PDF), a verifier-free test-time adaptation framework that improves decision performance without fine-tuning the base model. PDF mitigates the spurious correlation through uncertainty-based data augmentation and action voting, while an adaptive scheduler allocates augmentation budgets to balance performance and efficiency. To further improve stability, PDF learns a lightweight perturbation module that retrospectively adjusts action logits guided by delayed feedback,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
