TGRPO :Fine-tuning Vision-Language-Action Model via Trajectory-wise Group Relative Policy Optimization
Zengjue Chen, Runliang Niu, He Kong, Qi Wang, Qianli Xing, Zipei Fan

TL;DR
This paper introduces TGRPO, an RL-based training framework for VLA models that uses trajectory grouping and language model-generated rewards to enhance robotic task performance and generalization.
Contribution
TGRPO is a novel online RL framework that leverages trajectory grouping and language model-assisted dense rewards for improved VLA model training.
Findings
Achieved 80.7% success rate on LIBERO benchmark tasks.
Outperformed supervised fine-tuning and other RL methods.
Reduced variance and improved convergence in robotic tasks.
Abstract
Visual-Language-Action (VLA) models have demonstrated strong cross-scenario generalization capabilities in various robotic tasks through large-scale pre-training and task-specific fine-tuning. However, their training paradigm mainly relies on manually collected successful demonstrations, making it difficult to adapt to complex environments when encountering out-of-distribution (OOD) scenarios or execution biases. While Reinforcement Learning (RL) provides a closed-loop optimization framework via active trial-and-error mechanism, it suffers from sparse rewards, high variance, and unstable optimization in long-horizon robotic tasks. To address these limitations, we propose Trajectory-based Group Relative Policy Optimization (TGRPO), an online RL-based training framework for VLA models. TGRPO leverages task analysis generated by a large language model to automatically construct dense…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning
