DeepAveragers: Offline Reinforcement Learning by Solving Derived Non-Parametric MDPs
Aayam Shrestha, Stefan Lee, Prasad Tadepalli, Alan Fern

TL;DR
This paper introduces DAC-MDP, a non-parametric approach to offline RL that leverages deep representations and costs to handle limited data, demonstrating scalability and effectiveness in complex environments.
Contribution
The paper presents DAC-MDP, a novel non-parametric model for offline RL that integrates deep representations and cost mechanisms to improve performance with limited data.
Findings
DAC-MDP can be applied on top of learned representations.
The approach supports zero-shot environment and goal adjustments.
Empirical results show scalability to large, complex offline RL tasks.
Abstract
We study an approach to offline reinforcement learning (RL) based on optimally solving finitely-represented MDPs derived from a static dataset of experience. This approach can be applied on top of any learned representation and has the potential to easily support multiple solution objectives as well as zero-shot adjustment to changing environments and goals. Our main contribution is to introduce the Deep Averagers with Costs MDP (DAC-MDP) and to investigate its solutions for offline RL. DAC-MDPs are a non-parametric model that can leverage deep representations and account for limited data by introducing costs for exploiting under-represented parts of the model. In theory, we show conditions that allow for lower-bounding the performance of DAC-MDP solutions. We also investigate the empirical behavior in a number of environments, including those with image-based observations. Overall, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Adaptive Dynamic Programming Control
