SeMOPO: Learning High-quality Model and Policy from Low-quality Offline   Visual Datasets

Shenghua Wan; Ziyuan Chen; Le Gan; Shuai Feng; De-Chuan Zhan

arXiv:2406.09486·cs.CV·June 17, 2024

SeMOPO: Learning High-quality Model and Policy from Low-quality Offline Visual Datasets

Shenghua Wan, Ziyuan Chen, Le Gan, Shuai Feng, De-Chuan Zhan

PDF

Open Access

TL;DR

SeMOPO introduces a novel offline RL method that decomposes states into endogenous and exogenous parts, estimating uncertainty only on endogenous states to improve learning from low-quality visual datasets with distractors.

Contribution

The paper proposes SeMOPO, a new approach that decomposes latent states to better estimate model uncertainty, with theoretical guarantees and superior performance on challenging visual datasets.

Findings

01

SeMOPO outperforms baseline methods on LQV-D4RL datasets.

02

The method effectively handles distractors in high-dimensional visual data.

03

Theoretical performance bounds are established for SeMOPO.

Abstract

Model-based offline reinforcement Learning (RL) is a promising approach that leverages existing data effectively in many real-world applications, especially those involving high-dimensional inputs like images and videos. To alleviate the distribution shift issue in offline RL, existing model-based methods heavily rely on the uncertainty of learned dynamics. However, the model uncertainty estimation becomes significantly biased when observations contain complex distractors with non-trivial dynamics. To address this challenge, we propose a new approach - \emph{Separated Model-based Offline Policy Optimization} (SeMOPO) - decomposing latent states into endogenous and exogenous parts via conservative sampling and estimating model uncertainty on the endogenous states only. We provide a theoretical guarantee of model uncertainty and performance bound of SeMOPO. To assess the efficacy, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications