Learning with Good Feature Representations in Bandits and in RL with a   Generative Model

Tor Lattimore; Csaba Szepesvari; Gellert Weisz

arXiv:1911.07676·stat.ML·February 20, 2020·21 cites

Learning with Good Feature Representations in Bandits and in RL with a Generative Model

Tor Lattimore, Csaba Szepesvari, Gellert Weisz

PDF

Open Access 1 Video

TL;DR

This paper demonstrates that with a small approximation error in feature representations, one can efficiently find near-optimal actions in bandits and RL using few samples, leveraging the Kiefer-Wolfowitz theorem.

Contribution

It provides theoretical bounds showing how feature approximation errors affect learning efficiency in bandits and RL, with bounds independent of feature details.

Findings

01

A positive result using Kiefer-Wolfowitz theorem for action selection

02

Regret bound of order √(dn log(k)) + ε n √d log(n) in linear bandits

03

Approximate policy iteration achieves near-optimal policies with bounded error

Abstract

The construction by Du et al. (2019) implies that even if a learner is given linear features in $R^{d}$ that approximate the rewards in a bandit with a uniform error of $ϵ$ , then searching for an action that is optimal up to $O (ϵ)$ requires examining essentially all actions. We use the Kiefer-Wolfowitz theorem to prove a positive result that by checking only a few actions, a learner can always find an action that is suboptimal with an error of at most $O (ϵ d)$ . Thus, features are useful when the approximation error is small relative to the dimensionality of the features. The idea is applied to stochastic bandits and reinforcement learning with a generative model where the learner has access to $d$ -dimensional linear features that approximate the action-value functions for all policies to an accuracy of $ϵ$ . For linear bandits, we prove a bound on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Learning with Good Feature Representations in Bandits and in RL with a Generative Model· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Reinforcement Learning in Robotics