Vector-Valued Distributional Reinforcement Learning Policy Evaluation: A Hilbert Space Embedding Approach
Mehrdad Mohammadi, Qi Zheng, Ruoqing Zhu

TL;DR
This paper introduces a Hilbert space embedding approach for multi-dimensional distributional reinforcement learning, enabling efficient policy evaluation in complex continuous state-action spaces by replacing Wasserstein metrics with kernel mean embeddings.
Contribution
It proposes a novel off-policy evaluation framework using kernel mean embeddings in Hilbert spaces, with theoretical guarantees and practical demonstrations.
Findings
Robust off-policy evaluation demonstrated in simulations.
Theoretical contraction properties of the distributional Bellman operator.
Effective estimation of multi-dimensional value distributions.
Abstract
We propose an (offline) multi-dimensional distributional reinforcement learning framework (KE-DRL) that leverages Hilbert space mappings to estimate the kernel mean embedding of the multi-dimensional value distribution under a proposed target policy. In our setting, the state-action variables are multi-dimensional and continuous. By mapping probability measures into a reproducing kernel Hilbert space via kernel mean embeddings, our method replaces Wasserstein metrics with an integral probability metric. This enables efficient estimation in multi-dimensional state-action spaces and reward settings, where direct computation of Wasserstein distances is computationally challenging. Theoretically, we establish contraction properties of the distributional Bellman operator under our proposed metric involving the Matern family of kernels and provide uniform convergence guarantees. Simulations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Model Reduction and Neural Networks · Stochastic Gradient Optimization Techniques
