TL;DR
LaST-R1 introduces a reinforcement learning framework with latent reasoning and adaptive mechanisms, significantly enhancing robotic manipulation performance and generalization in dynamic environments.
Contribution
It presents LAPO, a novel RL algorithm that integrates latent Chain-of-Thought reasoning within the training loop for improved physical modeling.
Findings
Achieves 99.9% success rate on LIBERO benchmark with one-shot warm-up.
Yields up to 22.5% improvement over SOTA in real-world tasks.
Demonstrates strong generalization across simulated and real environments.
Abstract
Robotic foundation models require reasoning over complex visual scenes to execute adaptive actions in dynamic environments. While recent studies on latent-reasoning Vision-Language-Action (VLA) models have demonstrated the capability to capture fine-grained physical dynamics, they remain predominantly confined to static imitation learning, severely limiting their adaptability and generalization. In this paper, we present LaST-R1, a novel reinforcement learning (RL) post-training framework designed to effectively harness "latent reasoning-before-acting" policies. Specifically, we propose Latent-to-Action Policy Optimization (LAPO), a core RL algorithm that jointly optimizes the latent reasoning process and the action generation. By explicitly embedding latent Chain-of-Thought (CoT) reasoning directly within the RL optimization loop, LAPO stimulates profound physical world modeling, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
