Model-Based Reinforcement Learning with Value-Targeted Regression
Alex Ayoub, Zeyu Jia, Csaba Szepesvari, Mengdi Wang, Lin F. Yang

TL;DR
This paper introduces a model-based reinforcement learning algorithm that uses value-targeted regression and optimism to achieve near-optimal regret bounds in finite-horizon episodic settings, especially for linear mixture models.
Contribution
It proposes a novel optimism-based RL algorithm with regret bounds that are independent of state and action space sizes, applicable to general model families using Eluder dimension.
Findings
Regret bound of (d(H^3T)) for linear mixture models
Regret bounds are close to the theoretical lower bounds
Algorithm is effective for general model families with Eluder dimension
Abstract
This paper studies model-based reinforcement learning (RL) for regret minimization. We focus on finite-horizon episodic RL where the transition model belongs to a known family of models , a special case of which is when models in take the form of linear mixtures: . We propose a model based RL algorithm that is based on optimism principle: In each episode, the set of models that are `consistent' with the data collected is constructed. The criterion of consistency is based on the total squared error of that the model incurs on the task of predicting \emph{values} as determined by the last value estimate along the transitions. The next value function is then chosen by solving the optimistic planning problem with the constructed set of models. We derive a bound on the regret, which, in the special case of linear…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Smart Grid Energy Management
