Value Function Approximations via Kernel Embeddings for No-Regret Reinforcement Learning
Sayak Ray Chowdhury, Rafael Oliveira

TL;DR
This paper introduces a kernel embedding-based RL algorithm that learns transition representations in a reproducing kernel Hilbert space, achieving regret bounds without estimating transition probabilities, suitable for large or continuous state-action spaces.
Contribution
The paper proposes CME-RL, an online model-based RL algorithm using kernel embeddings for transition distributions, providing regret guarantees and bypassing transition probability estimation.
Findings
Achieves a regret bound of order O(Hmma_N\u221a{N})
Applicable to domains with kernels, large or continuous spaces
Provides new insights into kernel methods for RL
Abstract
We consider the regret minimization problem in reinforcement learning (RL) in the episodic setting. In many real-world RL environments, the state and action spaces are continuous or very large. Existing approaches establish regret guarantees by either a low-dimensional representation of the stochastic transition model or an approximation of the -functions. However, the understanding of function approximation schemes for state-value functions largely remains missing. In this paper, we propose an online model-based RL algorithm, namely the CME-RL, that learns representations of transition distributions as embeddings in a reproducing kernel Hilbert space while carefully balancing the exploitation-exploration tradeoff. We demonstrate the efficiency of our algorithm by proving a frequentist (worst-case) regret bound that is of order \footnote{…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI)
