Contextual Conservative Q-Learning for Offline Reinforcement Learning
Ke Jiang, Jiayu Yao, Xiaoyang Tan

TL;DR
This paper introduces Contextual Conservative Q-Learning (C-CQL), a method that enhances offline reinforcement learning by leveraging inverse dynamics models to improve policy robustness and reliability against out-of-distribution states.
Contribution
C-CQL is a novel approach that incorporates contextual information via inverse dynamics to reduce extrapolation error and improve policy stability in offline RL.
Findings
C-CQL achieves state-of-the-art results on offline Mujoco benchmarks.
C-CQL outperforms existing methods in noisy Mujoco settings.
Theoretical analysis shows C-CQL generalizes CQL and SDC.
Abstract
Offline reinforcement learning learns an effective policy on offline datasets without online interaction, and it attracts persistent research attention due to its potential of practical application. However, extrapolation error generated by distribution shift will still lead to the overestimation for those actions that transit to out-of-distribution(OOD) states, which degrades the reliability and robustness of the offline policy. In this paper, we propose Contextual Conservative Q-Learning(C-CQL) to learn a robustly reliable policy through the contextual information captured via an inverse dynamics model. With the supervision of the inverse dynamics model, it tends to learn a policy that generates stable transition at perturbed states, for the fact that pertuebed states are a common kind of OOD states. In this manner, we enable the learnt policy more likely to generate transition that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
