Conservative Offline Robot Policy Learning via Posterior-Transition Reweighting
Wanpeng Zhang, Hao Luo, Sipeng Zheng, Yicheng Feng, Haiweng Xu, Ziheng Xi, Chaoyi Xu, Haoqi Yuan, Zongqing Lu

TL;DR
This paper introduces Posterior-Transition Reweighting (PTR), a novel offline robot policy adaptation method that selectively emphasizes more reliable training samples, improving robustness when dealing with heterogeneous datasets.
Contribution
The paper proposes PTR, a reward-free, conservative post-training approach that reweights training samples based on their attributable post-action consequences, enhancing offline robot policy learning.
Findings
PTR improves policy robustness on heterogeneous datasets.
The method is compatible with diffusion and flow-matching action heads.
PTR outperforms uniform weighting in offline adaptation tasks.
Abstract
Offline post-training adapts a pretrained robot policy to a target dataset by supervised regression on recorded actions. In practice, robot datasets are heterogeneous: they mix embodiments, camera setups, and demonstrations of varying quality, so many trajectories reflect recovery behavior, inconsistent operator skill, or weakly informative supervision. Uniform post-training gives equal credit to all samples and can therefore average over conflicting or low-attribution data. We propose Posterior-Transition Reweighting (PTR), a reward-free and conservative post-training method that decides how much each training sample should influence the supervised update. For each sample, PTR encodes the observed post-action consequence as a latent target, inserts it into a candidate pool of mismatched targets, and uses a separate transition scorer to estimate a softmax identification posterior over…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Domain Adaptation and Few-Shot Learning
