PRTS: A Primitive Reasoning and Tasking System via Contrastive Representations

Yang Zhang; Jiangyuan Zhao; Chenyou Fan; Fangzheng Yan; Tian Li; Haitong Tang; Sen Fu; Xuan'er Wu; Qizhen Weng; Weinan Zhang; Xiu Li; Chi Zhang; Chenjia Bai; Xuelong Li

arXiv:2604.27472·cs.AI·May 1, 2026

PRTS: A Primitive Reasoning and Tasking System via Contrastive Representations

Yang Zhang, Jiangyuan Zhao, Chenyou Fan, Fangzheng Yan, Tian Li, Haitong Tang, Sen Fu, Xuan'er Wu, Qizhen Weng, Weinan Zhang, Xiu Li, Chi Zhang, Chenjia Bai, Xuelong Li

PDF

2 Models

TL;DR

PRTS introduces a goal-conditioned reinforcement learning approach to vision-language-action models, enabling better temporal reasoning and goal reachability understanding for robotic control.

Contribution

It reformulates pretraining as goal-conditioned reinforcement learning with contrastive embeddings, improving reasoning and planning in robotic models.

Findings

01

State-of-the-art performance on multiple benchmarks.

02

Significant gains in long-horizon and zero-shot tasks.

03

Enhanced goal reachability understanding improves success rates.

Abstract

Vision-Language-Action (VLA) models advance robotic control via strong visual-linguistic priors. However, existing VLAs predominantly frame pretraining as supervised behavior cloning, overlooking the fundamental nature of robot learning as a goal-reaching process that requires understanding temporal task progress. We present \textbf{PRTS} (\textbf{P}rimitive \textbf{R}easoning and \textbf{T}asking \textbf{S}ystem), a VLA foundation model that reformulates pretraining through Goal-Conditioned Reinforcement Learning. By treating language instructions as goals and employing contrastive reinforcement learning, PRTS learns a unified embedding space where the inner product of state-action and goal embeddings approximates the log-discounted goal occupancy, the probability of reaching the language-specified goal from the current state-action, quantitatively assessing physical feasibility beyond…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.