Value Function Approximations via Kernel Embeddings for No-Regret   Reinforcement Learning

Sayak Ray Chowdhury; Rafael Oliveira

arXiv:2011.07881·cs.LG·June 29, 2022

Value Function Approximations via Kernel Embeddings for No-Regret Reinforcement Learning

Sayak Ray Chowdhury, Rafael Oliveira

PDF

Open Access

TL;DR

This paper introduces a kernel embedding-based RL algorithm that learns transition representations in a reproducing kernel Hilbert space, achieving regret bounds without estimating transition probabilities, suitable for large or continuous state-action spaces.

Contribution

The paper proposes CME-RL, an online model-based RL algorithm using kernel embeddings for transition distributions, providing regret guarantees and bypassing transition probability estimation.

Findings

01

Achieves a regret bound of order O(Hmma_N\u221a{N})

02

Applicable to domains with kernels, large or continuous spaces

03

Provides new insights into kernel methods for RL

Abstract

We consider the regret minimization problem in reinforcement learning (RL) in the episodic setting. In many real-world RL environments, the state and action spaces are continuous or very large. Existing approaches establish regret guarantees by either a low-dimensional representation of the stochastic transition model or an approximation of the $Q$ -functions. However, the understanding of function approximation schemes for state-value functions largely remains missing. In this paper, we propose an online model-based RL algorithm, namely the CME-RL, that learns representations of transition distributions as embeddings in a reproducing kernel Hilbert space while carefully balancing the exploitation-exploration tradeoff. We demonstrate the efficiency of our algorithm by proving a frequentist (worst-case) regret bound that is of order $\tilde{O} (H γ_{N} N)$ \footnote{…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI)