TGRPO :Fine-tuning Vision-Language-Action Model via Trajectory-wise Group Relative Policy Optimization

Zengjue Chen; Runliang Niu; He Kong; Qi Wang; Qianli Xing; Zipei Fan

arXiv:2506.08440·cs.RO·September 30, 2025

TGRPO :Fine-tuning Vision-Language-Action Model via Trajectory-wise Group Relative Policy Optimization

Zengjue Chen, Runliang Niu, He Kong, Qi Wang, Qianli Xing, Zipei Fan

PDF

Open Access

TL;DR

This paper introduces TGRPO, an RL-based training framework for VLA models that uses trajectory grouping and language model-generated rewards to enhance robotic task performance and generalization.

Contribution

TGRPO is a novel online RL framework that leverages trajectory grouping and language model-assisted dense rewards for improved VLA model training.

Findings

01

Achieved 80.7% success rate on LIBERO benchmark tasks.

02

Outperformed supervised fine-tuning and other RL methods.

03

Reduced variance and improved convergence in robotic tasks.

Abstract

Visual-Language-Action (VLA) models have demonstrated strong cross-scenario generalization capabilities in various robotic tasks through large-scale pre-training and task-specific fine-tuning. However, their training paradigm mainly relies on manually collected successful demonstrations, making it difficult to adapt to complex environments when encountering out-of-distribution (OOD) scenarios or execution biases. While Reinforcement Learning (RL) provides a closed-loop optimization framework via active trial-and-error mechanism, it suffers from sparse rewards, high variance, and unstable optimization in long-horizon robotic tasks. To address these limitations, we propose Trajectory-based Group Relative Policy Optimization (TGRPO), an online RL-based training framework for VLA models. TGRPO leverages task analysis generated by a large language model to automatically construct dense…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning