Conservative Offline Robot Policy Learning via Posterior-Transition Reweighting

Wanpeng Zhang; Hao Luo; Sipeng Zheng; Yicheng Feng; Haiweng Xu; Ziheng Xi; Chaoyi Xu; Haoqi Yuan; Zongqing Lu

arXiv:2603.16542·cs.RO·March 18, 2026

Conservative Offline Robot Policy Learning via Posterior-Transition Reweighting

Wanpeng Zhang, Hao Luo, Sipeng Zheng, Yicheng Feng, Haiweng Xu, Ziheng Xi, Chaoyi Xu, Haoqi Yuan, Zongqing Lu

PDF

Open Access

TL;DR

This paper introduces Posterior-Transition Reweighting (PTR), a novel offline robot policy adaptation method that selectively emphasizes more reliable training samples, improving robustness when dealing with heterogeneous datasets.

Contribution

The paper proposes PTR, a reward-free, conservative post-training approach that reweights training samples based on their attributable post-action consequences, enhancing offline robot policy learning.

Findings

01

PTR improves policy robustness on heterogeneous datasets.

02

The method is compatible with diffusion and flow-matching action heads.

03

PTR outperforms uniform weighting in offline adaptation tasks.

Abstract

Offline post-training adapts a pretrained robot policy to a target dataset by supervised regression on recorded actions. In practice, robot datasets are heterogeneous: they mix embodiments, camera setups, and demonstrations of varying quality, so many trajectories reflect recovery behavior, inconsistent operator skill, or weakly informative supervision. Uniform post-training gives equal credit to all samples and can therefore average over conflicting or low-attribution data. We propose Posterior-Transition Reweighting (PTR), a reward-free and conservative post-training method that decides how much each training sample should influence the supervised update. For each sample, PTR encodes the observed post-action consequence as a latent target, inserts it into a candidate pool of mismatched targets, and uses a separate transition scorer to estimate a softmax identification posterior over…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Domain Adaptation and Few-Shot Learning