Off-Policy Reinforcement Learning with High Dimensional Reward

Dong Neuck Lee; Michael R. Kosorok

arXiv:2408.07660·stat.ML·August 15, 2024

Off-Policy Reinforcement Learning with High Dimensional Reward

Dong Neuck Lee, Michael R. Kosorok

PDF

Open Access

TL;DR

This paper develops a theoretical foundation for distributional reinforcement learning with high-dimensional rewards, proving contraction properties and proposing a new algorithm for complex problems.

Contribution

It establishes the contraction property of the distributional Bellman operator in infinite-dimensional spaces and introduces a novel DRL algorithm for high-dimensional rewards.

Findings

01

Proves contraction property in Banach spaces.

02

Shows high-dimensional returns can be approximated in lower dimensions.

03

Proposes a new DRL algorithm for previously intractable problems.

Abstract

Conventional off-policy reinforcement learning (RL) focuses on maximizing the expected return of scalar rewards. Distributional RL (DRL), in contrast, studies the distribution of returns with the distributional Bellman operator in a Euclidean space, leading to highly flexible choices for utility. This paper establishes robust theoretical foundations for DRL. We prove the contraction property of the Bellman operator even when the reward space is an infinite-dimensional separable Banach space. Furthermore, we demonstrate that the behavior of high- or infinite-dimensional returns can be effectively approximated using a lower-dimensional Euclidean space. Leveraging these theoretical insights, we propose a novel DRL algorithm that tackles problems which have been previously intractable using conventional reinforcement learning approaches.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics