Strategically Conservative Q-Learning

Yutaka Shimizu; Joey Hong; Sergey Levine; Masayoshi Tomizuka

arXiv:2406.04534·cs.LG·June 10, 2024

Strategically Conservative Q-Learning

Yutaka Shimizu, Joey Hong, Sergey Levine, Masayoshi Tomizuka

PDF

Open Access 1 Repo

TL;DR

This paper introduces Strategically Conservative Q-Learning (SCQ), a novel offline RL method that balances conservatism and optimism by distinguishing between easy and hard to estimate out-of-distribution data, leading to improved policy performance.

Contribution

SCQ offers a new framework that reduces unnecessary pessimism in value estimates by selectively handling OOD data, enhancing offline RL effectiveness.

Findings

01

Outperforms state-of-the-art methods on D4RL benchmarks.

02

Provides theoretical guarantees of conservative value estimation.

03

Effectively balances interpolation and extrapolation in neural networks.

Abstract

Offline reinforcement learning (RL) is a compelling paradigm to extend RL's practical utility by leveraging pre-collected, static datasets, thereby avoiding the limitations associated with collecting online interactions. The major difficulty in offline RL is mitigating the impact of approximation errors when encountering out-of-distribution (OOD) actions; doing so ineffectively will lead to policies that prefer OOD actions, which can lead to unexpected and potentially catastrophic results. Despite the variety of works proposed to address this issue, they tend to excessively suppress the value function in and around OOD regions, resulting in overly pessimistic value estimates. In this paper, we propose a novel framework called Strategically Conservative Q-Learning (SCQ) that distinguishes between OOD data that is easy and hard to estimate, ultimately resulting in less conservative value…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

purewater0901/scq
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCognitive Science and Mapping · Complex Systems and Decision Making · Online and Blended Learning

MethodsQ-Learning