Vector-Valued Distributional Reinforcement Learning Policy Evaluation: A Hilbert Space Embedding Approach

Mehrdad Mohammadi; Qi Zheng; Ruoqing Zhu

arXiv:2601.18952·cs.LG·January 28, 2026

Vector-Valued Distributional Reinforcement Learning Policy Evaluation: A Hilbert Space Embedding Approach

Mehrdad Mohammadi, Qi Zheng, Ruoqing Zhu

PDF

Open Access

TL;DR

This paper introduces a Hilbert space embedding approach for multi-dimensional distributional reinforcement learning, enabling efficient policy evaluation in complex continuous state-action spaces by replacing Wasserstein metrics with kernel mean embeddings.

Contribution

It proposes a novel off-policy evaluation framework using kernel mean embeddings in Hilbert spaces, with theoretical guarantees and practical demonstrations.

Findings

01

Robust off-policy evaluation demonstrated in simulations.

02

Theoretical contraction properties of the distributional Bellman operator.

03

Effective estimation of multi-dimensional value distributions.

Abstract

We propose an (offline) multi-dimensional distributional reinforcement learning framework (KE-DRL) that leverages Hilbert space mappings to estimate the kernel mean embedding of the multi-dimensional value distribution under a proposed target policy. In our setting, the state-action variables are multi-dimensional and continuous. By mapping probability measures into a reproducing kernel Hilbert space via kernel mean embeddings, our method replaces Wasserstein metrics with an integral probability metric. This enables efficient estimation in multi-dimensional state-action spaces and reward settings, where direct computation of Wasserstein distances is computationally challenging. Theoretically, we establish contraction properties of the distributional Bellman operator under our proposed metric involving the Matern family of kernels and provide uniform convergence guarantees. Simulations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Model Reduction and Neural Networks · Stochastic Gradient Optimization Techniques