Model-Based Reinforcement Learning with Value-Targeted Regression

Alex Ayoub; Zeyu Jia; Csaba Szepesvari; Mengdi Wang; Lin F. Yang

arXiv:2006.01107·cs.LG·June 2, 2020·71 cites

Model-Based Reinforcement Learning with Value-Targeted Regression

Alex Ayoub, Zeyu Jia, Csaba Szepesvari, Mengdi Wang, Lin F. Yang

PDF

Open Access 2 Videos

TL;DR

This paper introduces a model-based reinforcement learning algorithm that uses value-targeted regression and optimism to achieve near-optimal regret bounds in finite-horizon episodic settings, especially for linear mixture models.

Contribution

It proposes a novel optimism-based RL algorithm with regret bounds that are independent of state and action space sizes, applicable to general model families using Eluder dimension.

Findings

01

Regret bound of (d(H^3T)) for linear mixture models

02

Regret bounds are close to the theoretical lower bounds

03

Algorithm is effective for general model families with Eluder dimension

Abstract

This paper studies model-based reinforcement learning (RL) for regret minimization. We focus on finite-horizon episodic RL where the transition model $P$ belongs to a known family of models $P$ , a special case of which is when models in $P$ take the form of linear mixtures: $P_{θ} = \sum_{i = 1}^{d} θ_{i} P_{i}$ . We propose a model based RL algorithm that is based on optimism principle: In each episode, the set of models that are `consistent' with the data collected is constructed. The criterion of consistency is based on the total squared error of that the model incurs on the task of predicting \emph{values} as determined by the last value estimate along the transitions. The next value function is then chosen by solving the optimistic planning problem with the constructed set of models. We derive a bound on the regret, which, in the special case of linear…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Model-Based Reinforcement Learning with Value-Targeted Regression· youtube

Model-Based Reinforcement Learning with Value-Targeted Regression· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Smart Grid Energy Management