Exploiting Generalization in Offline Reinforcement Learning via Unseen State Augmentations
Nirbhay Modhe, Qiaozi Gao, Ashwin Kalyan, Dhruv Batra, Govind Thattai,, Gaurav Sukhatme

TL;DR
This paper introduces a novel unseen state augmentation method in offline RL that enhances the exploitation of distant unseen states by leveraging value-informed perturbations and uncertainty filtering, leading to improved task performance.
Contribution
It proposes a new unseen state augmentation strategy that relaxes previous limitations, enabling better generalization to unseen states in offline RL.
Findings
Improved performance across multiple offline RL benchmarks.
Consistently lower average dataset Q-values indicating more conservative estimates.
Enhanced ability to exploit unseen states beyond offline data.
Abstract
Offline reinforcement learning (RL) methods strike a balance between exploration and exploitation by conservative value estimation -- penalizing values of unseen states and actions. Model-free methods penalize values at all unseen actions, while model-based methods are able to further exploit unseen states via model rollouts. However, such methods are handicapped in their ability to find unseen states far away from the available offline data due to two factors -- (a) very short rollout horizons in models due to cascading model errors, and (b) model rollouts originating solely from states observed in offline data. We relax the second assumption and present a novel unseen state augmentation strategy to allow exploitation of unseen states where the learned model and value estimates generalize. Our strategy finds unseen states by value-informed perturbations of seen states followed by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Software Engineering Research
