Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and   Regret Bound

Lin F. Yang; Mengdi Wang

arXiv:1905.10389·cs.LG·June 14, 2019·41 cites

Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and Regret Bound

Lin F. Yang, Mengdi Wang

PDF

Open Access 1 Video

TL;DR

This paper introduces MatrixRL, an online reinforcement learning algorithm that learns low-dimensional representations of transition models using features or kernels, achieving near-optimal regret bounds in high-dimensional settings.

Contribution

The paper presents the first near-optimal regret bounds for RL with feature and kernel representations, extending theoretical guarantees to high-dimensional and kernelized models.

Findings

01

MatrixRL achieves regret bound O(H^2 d log T √T) with features.

02

Kernelized MatrixRL achieves regret bound O(H^2 ˜d log T √T) with kernels.

03

First regret bounds for feature and kernel-based RL that are near-optimal in T and dimension.

Abstract

Exploration in reinforcement learning (RL) suffers from the curse of dimensionality when the state-action space is large. A common practice is to parameterize the high-dimensional value and policy functions using given features. However existing methods either have no theoretical guarantee or suffer a regret that is exponential in the planning horizon $H$ . In this paper, we propose an online RL algorithm, namely the MatrixRL, that leverages ideas from linear bandit to learn a low-dimensional representation of the probability transition model while carefully balancing the exploitation-exploration tradeoff. We show that MatrixRL achieves a regret bound $O (H^{2} d lo g T T)$ where $d$ is the number of features. MatrixRL has an equivalent kernelized version, which is able to work with an arbitrary kernel Hilbert space without using explicit features. In this case, the kernelized…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and Regret Bound· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms