Distributional Bellman Operators over Mean Embeddings

Li Kevin Wenliang; Gr\'egoire Del\'etang; Matthew Aitchison; Marcus; Hutter; Anian Ruoss; Arthur Gretton; Mark Rowland

arXiv:2312.07358·stat.ML·March 5, 2024·1 cites

Distributional Bellman Operators over Mean Embeddings

Li Kevin Wenliang, Gr\'egoire Del\'etang, Matthew Aitchison, Marcus, Hutter, Anian Ruoss, Arthur Gretton, Mark Rowland

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces a new distributional reinforcement learning framework using mean embeddings, offering novel algorithms, convergence guarantees, and improved empirical performance, including a deep RL agent outperforming baselines.

Contribution

It presents a novel mean embedding-based framework for distributional RL, with new algorithms, convergence theory, and successful integration into deep RL for improved results.

Findings

01

Algorithms demonstrate convergence in tabular tasks

02

Empirical results show improved performance over baselines

03

Deep RL agent outperforms existing distributional methods on Arcade Learning Environment

Abstract

We propose a novel algorithmic framework for distributional reinforcement learning, based on learning finite-dimensional mean embeddings of return distributions. We derive several new algorithms for dynamic programming and temporal-difference learning based on this framework, provide asymptotic convergence theory, and examine the empirical performance of the algorithms on a suite of tabular tasks. Further, we show that this approach can be straightforwardly combined with deep reinforcement learning, and obtain a new deep RL agent that improves over baseline distributional approaches on the Arcade Learning Environment.

Peer Reviews

Decision·ICML 2024 Poster

Reviewer 01Rating 8· accept, good paperConfidence 3

Strengths

This paper satisfies all the criteria of an excellent paper: 1. It is exceptionally clearly written. Reading it and learning from it was a joy. It does a great job at being thorough without being pedantic in its discussion. I really appreciated the concrete discussion in sections 3.2 and 4.1; it dovetails quite nicely with the rest of the paper. 2. It makes a useful contribution to the fun and important problem of distributional reinforcement learning. The idea is simple yet elegant.

Weaknesses

The formulation of feature maps used in the paper (equation (8)) is sort of "pulled out a hat". It would be useful to justify why it made sense to use that formulation as opposed to other possibilities. See also the questions below.

Reviewer 02Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

1. The paper introduces a novel framework for distributional reinforcement learning, which is based on learning mean embeddings of return distributions. This approach avoids the need for expensive imputation strategies, which can be computationally expensive and biologically implausible. 2. The authors provide a theoretical analysis of the proposed algorithms, including asymptotic convergence results. This analysis helps to establish the theoretical foundations of the approach and provides insig

Weaknesses

1. The proposed method requires a linear approximation for the Bellman update equation and require calculating a Bellman coefficient matrix $B_r$ that can be computationally challenging. This contradicts the motivation to improve computation efficiency and reduce the imputation error by purely operating in the sketch space. It is unclear how the proposed method is superior to previous methods both computationally and statistically. 2. The experimental validation is limited. The experiment on th

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

1) The authors provide detailed experiments, particularly for ablations on different feature functions. 2) The authors provide theoretical guarantee, which adds credibility to their proposed method.

Weaknesses

1) The authors' motivation for using the sketch Bellman operator is to reduce the need for expensive imputation strategies when converting between sketches and distributions. However, the experiment does not verify this claim. It would be helpful if the authors could provide some quantitative results (such as training time) to demonstrate this reduction. 2) The proposed method performs worse than IQN, even though the authors claim that IQN uses a more complex prediction network for non-parametr

Code & Models

Repositories

google-deepmind/sketch_dqn
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control