Exploiting Generalization in Offline Reinforcement Learning via Unseen   State Augmentations

Nirbhay Modhe; Qiaozi Gao; Ashwin Kalyan; Dhruv Batra; Govind Thattai,; Gaurav Sukhatme

arXiv:2308.03882·cs.LG·September 26, 2023·1 cites

Exploiting Generalization in Offline Reinforcement Learning via Unseen State Augmentations

Nirbhay Modhe, Qiaozi Gao, Ashwin Kalyan, Dhruv Batra, Govind Thattai,, Gaurav Sukhatme

PDF

Open Access

TL;DR

This paper introduces a novel unseen state augmentation method in offline RL that enhances the exploitation of distant unseen states by leveraging value-informed perturbations and uncertainty filtering, leading to improved task performance.

Contribution

It proposes a new unseen state augmentation strategy that relaxes previous limitations, enabling better generalization to unseen states in offline RL.

Findings

01

Improved performance across multiple offline RL benchmarks.

02

Consistently lower average dataset Q-values indicating more conservative estimates.

03

Enhanced ability to exploit unseen states beyond offline data.

Abstract

Offline reinforcement learning (RL) methods strike a balance between exploration and exploitation by conservative value estimation -- penalizing values of unseen states and actions. Model-free methods penalize values at all unseen actions, while model-based methods are able to further exploit unseen states via model rollouts. However, such methods are handicapped in their ability to find unseen states far away from the available offline data due to two factors -- (a) very short rollout horizons in models due to cascading model errors, and (b) model rollouts originating solely from states observed in offline data. We relax the second assumption and present a novel unseen state augmentation strategy to allow exploitation of unseen states where the learned model and value estimates generalize. Our strategy finds unseen states by value-informed perturbations of seen states followed by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Software Engineering Research