SeMOPO: Learning High-quality Model and Policy from Low-quality Offline Visual Datasets
Shenghua Wan, Ziyuan Chen, Le Gan, Shuai Feng, De-Chuan Zhan

TL;DR
SeMOPO introduces a novel offline RL method that decomposes states into endogenous and exogenous parts, estimating uncertainty only on endogenous states to improve learning from low-quality visual datasets with distractors.
Contribution
The paper proposes SeMOPO, a new approach that decomposes latent states to better estimate model uncertainty, with theoretical guarantees and superior performance on challenging visual datasets.
Findings
SeMOPO outperforms baseline methods on LQV-D4RL datasets.
The method effectively handles distractors in high-dimensional visual data.
Theoretical performance bounds are established for SeMOPO.
Abstract
Model-based offline reinforcement Learning (RL) is a promising approach that leverages existing data effectively in many real-world applications, especially those involving high-dimensional inputs like images and videos. To alleviate the distribution shift issue in offline RL, existing model-based methods heavily rely on the uncertainty of learned dynamics. However, the model uncertainty estimation becomes significantly biased when observations contain complex distractors with non-trivial dynamics. To address this challenge, we propose a new approach - \emph{Separated Model-based Offline Policy Optimization} (SeMOPO) - decomposing latent states into endogenous and exogenous parts via conservative sampling and estimating model uncertainty on the endogenous states only. We provide a theoretical guarantee of model uncertainty and performance bound of SeMOPO. To assess the efficacy, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications
