LaST-R1: Reinforcing Robotic Manipulation via Adaptive Physical Latent Reasoning

Hao Chen; Jiaming Liu; Zhonghao Yan; Nuowei Han; Renrui Zhang; Chenyang Gu; Jialin Gao; Ziyu Guo; Siyuan Qian; Yinxi Wang; Peng Jia; Shanghang Zhang; Pheng-Ann Heng

arXiv:2604.28192·cs.RO·May 8, 2026

LaST-R1: Reinforcing Robotic Manipulation via Adaptive Physical Latent Reasoning

Hao Chen, Jiaming Liu, Zhonghao Yan, Nuowei Han, Renrui Zhang, Chenyang Gu, Jialin Gao, Ziyu Guo, Siyuan Qian, Yinxi Wang, Peng Jia, Shanghang Zhang, Pheng-Ann Heng

PDF

1 Repo

TL;DR

LaST-R1 introduces a reinforcement learning framework with latent reasoning and adaptive mechanisms, significantly enhancing robotic manipulation performance and generalization in dynamic environments.

Contribution

It presents LAPO, a novel RL algorithm that integrates latent Chain-of-Thought reasoning within the training loop for improved physical modeling.

Findings

01

Achieves 99.9% success rate on LIBERO benchmark with one-shot warm-up.

02

Yields up to 22.5% improvement over SOTA in real-world tasks.

03

Demonstrates strong generalization across simulated and real environments.

Abstract

Robotic foundation models require reasoning over complex visual scenes to execute adaptive actions in dynamic environments. While recent studies on latent-reasoning Vision-Language-Action (VLA) models have demonstrated the capability to capture fine-grained physical dynamics, they remain predominantly confined to static imitation learning, severely limiting their adaptability and generalization. In this paper, we present LaST-R1, a novel reinforcement learning (RL) post-training framework designed to effectively harness "latent reasoning-before-acting" policies. Specifically, we propose Latent-to-Action Policy Optimization (LAPO), a core RL algorithm that jointly optimizes the latent reasoning process and the action generation. By explicitly embedding latent Chain-of-Thought (CoT) reasoning directly within the RL optimization loop, LAPO stimulates profound physical world modeling, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chen-h01/LaST-R1
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.