STARE-VLA: Progressive Stage-Aware Reinforcement for Fine-Tuning Vision-Language-Action Models

Feng Xu; Guangyao Zhai; Xin Kong; Tingzhong Fu; Daniel F.N. Gordon; Xueli An; Benjamin Busam

arXiv:2512.05107·cs.RO·December 25, 2025

STARE-VLA: Progressive Stage-Aware Reinforcement for Fine-Tuning Vision-Language-Action Models

Feng Xu, Guangyao Zhai, Xin Kong, Tingzhong Fu, Daniel F.N. Gordon, Xueli An, Benjamin Busam

PDF

Open Access

TL;DR

This paper introduces STARE, a stage-aware reinforcement module that decomposes long-horizon actions into meaningful stages, improving fine-tuning of vision-language-action models for robotic manipulation tasks.

Contribution

The paper proposes a novel stage-aware reinforcement approach integrated into existing optimization methods, enabling more precise credit assignment and stable training in VLA models.

Findings

01

Achieved state-of-the-art success rates of 98.0% on SimplerEnv.

02

Achieved state-of-the-art success rates of 96.4% on ManiSkill3.

03

Demonstrated substantial performance improvements over existing methods.

Abstract

Recent advances in Vision-Language-Action (VLA) models, powered by large language models and reinforcement learning-based fine-tuning, have shown remarkable progress in robotic manipulation. Existing methods often treat long-horizon actions as linguistic sequences and apply trajectory-level optimization methods such as Trajectory-wise Preference Optimization (TPO) or Proximal Policy Optimization (PPO), leading to coarse credit assignment and unstable training. However, unlike language, where a unified semantic meaning is preserved despite flexible sentence order, action trajectories progress through causally chained stages with different learning difficulties. This motivates progressive stage optimization. Thereby, we present Stage-Aware Reinforcement (STARE), a module that decomposes a long-horizon action trajectory into semantically meaningful stages and provides dense, interpretable,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Robot Manipulation and Learning