Distributional Bellman Operators over Mean Embeddings
Li Kevin Wenliang, Gr\'egoire Del\'etang, Matthew Aitchison, Marcus, Hutter, Anian Ruoss, Arthur Gretton, Mark Rowland

TL;DR
This paper introduces a new distributional reinforcement learning framework using mean embeddings, offering novel algorithms, convergence guarantees, and improved empirical performance, including a deep RL agent outperforming baselines.
Contribution
It presents a novel mean embedding-based framework for distributional RL, with new algorithms, convergence theory, and successful integration into deep RL for improved results.
Findings
Algorithms demonstrate convergence in tabular tasks
Empirical results show improved performance over baselines
Deep RL agent outperforms existing distributional methods on Arcade Learning Environment
Abstract
We propose a novel algorithmic framework for distributional reinforcement learning, based on learning finite-dimensional mean embeddings of return distributions. We derive several new algorithms for dynamic programming and temporal-difference learning based on this framework, provide asymptotic convergence theory, and examine the empirical performance of the algorithms on a suite of tabular tasks. Further, we show that this approach can be straightforwardly combined with deep reinforcement learning, and obtain a new deep RL agent that improves over baseline distributional approaches on the Arcade Learning Environment.
Peer Reviews
Decision·ICML 2024 Poster
This paper satisfies all the criteria of an excellent paper: 1. It is exceptionally clearly written. Reading it and learning from it was a joy. It does a great job at being thorough without being pedantic in its discussion. I really appreciated the concrete discussion in sections 3.2 and 4.1; it dovetails quite nicely with the rest of the paper. 2. It makes a useful contribution to the fun and important problem of distributional reinforcement learning. The idea is simple yet elegant.
The formulation of feature maps used in the paper (equation (8)) is sort of "pulled out a hat". It would be useful to justify why it made sense to use that formulation as opposed to other possibilities. See also the questions below.
1. The paper introduces a novel framework for distributional reinforcement learning, which is based on learning mean embeddings of return distributions. This approach avoids the need for expensive imputation strategies, which can be computationally expensive and biologically implausible. 2. The authors provide a theoretical analysis of the proposed algorithms, including asymptotic convergence results. This analysis helps to establish the theoretical foundations of the approach and provides insig
1. The proposed method requires a linear approximation for the Bellman update equation and require calculating a Bellman coefficient matrix $B_r$ that can be computationally challenging. This contradicts the motivation to improve computation efficiency and reduce the imputation error by purely operating in the sketch space. It is unclear how the proposed method is superior to previous methods both computationally and statistically. 2. The experimental validation is limited. The experiment on th
1) The authors provide detailed experiments, particularly for ablations on different feature functions. 2) The authors provide theoretical guarantee, which adds credibility to their proposed method.
1) The authors' motivation for using the sketch Bellman operator is to reduce the need for expensive imputation strategies when converting between sketches and distributions. However, the experiment does not verify this claim. It would be helpful if the authors could provide some quantitative results (such as training time) to demonstrate this reduction. 2) The proposed method performs worse than IQN, even though the authors claim that IQN uses a more complex prediction network for non-parametr
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control
