DeepAveragers: Offline Reinforcement Learning by Solving Derived   Non-Parametric MDPs

Aayam Shrestha; Stefan Lee; Prasad Tadepalli; Alan Fern

arXiv:2010.08891·cs.LG·February 6, 2025·5 cites

DeepAveragers: Offline Reinforcement Learning by Solving Derived Non-Parametric MDPs

Aayam Shrestha, Stefan Lee, Prasad Tadepalli, Alan Fern

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces DAC-MDP, a non-parametric approach to offline RL that leverages deep representations and costs to handle limited data, demonstrating scalability and effectiveness in complex environments.

Contribution

The paper presents DAC-MDP, a novel non-parametric model for offline RL that integrates deep representations and cost mechanisms to improve performance with limited data.

Findings

01

DAC-MDP can be applied on top of learned representations.

02

The approach supports zero-shot environment and goal adjustments.

03

Empirical results show scalability to large, complex offline RL tasks.

Abstract

We study an approach to offline reinforcement learning (RL) based on optimally solving finitely-represented MDPs derived from a static dataset of experience. This approach can be applied on top of any learned representation and has the potential to easily support multiple solution objectives as well as zero-shot adjustment to changing environments and goals. Our main contribution is to introduce the Deep Averagers with Costs MDP (DAC-MDP) and to investigate its solutions for offline RL. DAC-MDPs are a non-parametric model that can leverage deep representations and account for limited data by introducing costs for exploiting under-represented parts of the model. In theory, we show conditions that allow for lower-bounding the performance of DAC-MDP solutions. We also investigate the empirical behavior in a number of environments, including those with image-based observations. Overall, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

DeepAveragers: Offline Reinforcement Learning By Solving Derived Non-Parametric MDPs· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Adaptive Dynamic Programming Control