Exploration in Model-based Reinforcement Learning with Randomized Reward

Lingxiao Wang; Ping Li

arXiv:2301.03142·stat.ML·January 10, 2023

Exploration in Model-based Reinforcement Learning with Randomized Reward

Lingxiao Wang, Ping Li

PDF

Open Access

TL;DR

This paper investigates reward randomization in model-based reinforcement learning, demonstrating its potential to guarantee optimism and achieve near-optimal worst-case regret under certain models and conditions.

Contribution

It provides the first worst-case regret analysis of randomized MBRL with function approximation, extending theory to generalized settings and proposing concrete reward randomization methods.

Findings

01

Reward randomization guarantees partial optimism under KNR models.

02

It yields near-optimal worst-case regret in interaction count.

03

Conditions for effective reward randomization are identified and exemplified.

Abstract

Model-based Reinforcement Learning (MBRL) has been widely adapted due to its sample efficiency. However, existing worst-case regret analysis typically requires optimistic planning, which is not realistic in general. In contrast, motivated by the theory, empirical study utilizes ensemble of models, which achieve state-of-the-art performance on various testing environments. Such deviation between theory and empirical study leads us to question whether randomized model ensemble guarantee optimism, and hence the optimal worst-case regret? This paper partially answers such question from the perspective of reward randomization, a scarcely explored direction of exploration with MBRL. We show that under the kernelized linear regulator (KNR) model, reward randomization guarantees a partial optimism, which further yields a near-optimal worst-case regret in terms of the number of interactions. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Receptor Mechanisms and Signaling · Reinforcement Learning in Robotics