RobustVLA: Robustness-Aware Reinforcement Post-Training for Vision-Language-Action Models

Hongyin Zhang; Shuo Zhang; Junxi Jin; Qixin Zeng; Runze Li; and Donglin Wang

arXiv:2511.01331·cs.RO·December 2, 2025

RobustVLA: Robustness-Aware Reinforcement Post-Training for Vision-Language-Action Models

Hongyin Zhang, Shuo Zhang, Junxi Jin, Qixin Zeng, Runze Li, and Donglin Wang

PDF

Open Access 3 Reviews

TL;DR

RobustVLA introduces a robustness-aware reinforcement post-training method that significantly improves the reliability of vision-language-action models in noisy and uncertain robotic environments.

Contribution

The paper presents RobustVLA, a novel online RL post-training approach with Jacobian and smoothness regularizations to enhance VLA model robustness against environmental disturbances.

Findings

01

RobustVLA outperforms prior methods in robustness and reliability.

02

Jacobian regularization reduces sensitivity to observation noise.

03

Smoothness regularization stabilizes policies under action perturbations.

Abstract

Vision-Language-Action (VLA) models have recently emerged as powerful general-purpose policies for robotic manipulation, benefiting from large-scale multi-modal pre-training. However, they often fail to generalize reliably in out-of-distribution deployments, where unavoidable disturbances such as observation noise, sensor errors, or actuation perturbations become prevalent. While recent Reinforcement Learning (RL)-based post-training provides a practical means to adapt pre-trained VLA models, existing methods mainly emphasize reward maximization and overlook robustness to environmental uncertainty. In this work, we introduce RobustVLA, a lightweight online RL post-training method designed to explicitly enhance the resilience of VLA models. Through a systematic robustness analysis, we identify two key regularizations: Jacobian regularization, which mitigates sensitivity to observation…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 3

Strengths

1. High Practical Relevance: The work astutely targets a well-known and significant vulnerability of VLA models: their fragility to observational and action perturbations in out-of-distribution (OOD) scenarios. The focus on enhancing robustness has substantial practical value for real-world robotic deployment. 2. Solid Theoretical Foundation: The theoretical analysis is a key strength. The authors establish explicit upper bounds on the performance gap under observation perturbations, action per

Weaknesses

1. The authors note that computing the Jacobian directly on high-dimensional pixel inputs is prohibitively expensive and instead calculate it on the low-dimensional embeddings from the Llama-2 encoder used by OpenVLA-OFT. However, the paper fails to specify precisely which layer's output is used as the surrogate for the states. This lack of detail makes it difficult to fully assess the implementation's validity. 2. All experiments rely on a single, fixed pre-trained Llama-2 encoding. The work d

Reviewer 02Rating 6Confidence 4

Strengths

1. The robustness of VLA is important. 2. Most of the idea makes sense. 3. Experiment results are good. And the authors include ablation studies. Despite the flaws in section 4.1 (see weakness), this paper is still understandable, and the method is somewhat reasonable. I believe this paper is marginally above the acceptance threshold.

Weaknesses

1. Section 4.1, the motivation to introduce the bounded Jacobian ($\left\lVert \nabla_s \pi_t(s) \right\rVert \leq \lambda$) and $\left\lVert \pi_t - \pi_{t-1} \right\rVert_\infty \leq \delta_t$ is not explained. And not all of the notations are defined. 2. Why do we need to introduce $\left\lVert \pi_t - \pi_{t-1} \right\rVert_\infty \leq \delta_t$? I believe $\left\lVert \nabla_s \pi_t(s) \right\rVert \leq \lambda$ is already sufficient to bound the gap even when actions are perturbed. 3. Wh

Reviewer 03Rating 4Confidence 2

Strengths

- This paper highlights the critical robustness issue in VLA models. - The method builds on a reliable, widely adopted VLA baseline. - The paper introduces five carefully designed observation perturbations in a common benchmark.

Weaknesses

- The perturbation assumption may not be practical and seems inconsistent with the experimental setup. The state deviation is oversimplified by adding noise to the state space, and the dynamics are assumed to be Lipschitz continuous. - There are some issues in the proof of the Theorem 1, and it appears inconsistent with the algorithm. Please refer to questions for detail. - While the study of robustness/generalization is an established topic in visual reinforcement learning [1,2,3], adding nois

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Robot Manipulation and Learning