Strategically Conservative Q-Learning
Yutaka Shimizu, Joey Hong, Sergey Levine, Masayoshi Tomizuka

TL;DR
This paper introduces Strategically Conservative Q-Learning (SCQ), a novel offline RL method that balances conservatism and optimism by distinguishing between easy and hard to estimate out-of-distribution data, leading to improved policy performance.
Contribution
SCQ offers a new framework that reduces unnecessary pessimism in value estimates by selectively handling OOD data, enhancing offline RL effectiveness.
Findings
Outperforms state-of-the-art methods on D4RL benchmarks.
Provides theoretical guarantees of conservative value estimation.
Effectively balances interpolation and extrapolation in neural networks.
Abstract
Offline reinforcement learning (RL) is a compelling paradigm to extend RL's practical utility by leveraging pre-collected, static datasets, thereby avoiding the limitations associated with collecting online interactions. The major difficulty in offline RL is mitigating the impact of approximation errors when encountering out-of-distribution (OOD) actions; doing so ineffectively will lead to policies that prefer OOD actions, which can lead to unexpected and potentially catastrophic results. Despite the variety of works proposed to address this issue, they tend to excessively suppress the value function in and around OOD regions, resulting in overly pessimistic value estimates. In this paper, we propose a novel framework called Strategically Conservative Q-Learning (SCQ) that distinguishes between OOD data that is easy and hard to estimate, ultimately resulting in less conservative value…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCognitive Science and Mapping · Complex Systems and Decision Making · Online and Blended Learning
MethodsQ-Learning
