Exploration in Model-based Reinforcement Learning with Randomized Reward
Lingxiao Wang, Ping Li

TL;DR
This paper investigates reward randomization in model-based reinforcement learning, demonstrating its potential to guarantee optimism and achieve near-optimal worst-case regret under certain models and conditions.
Contribution
It provides the first worst-case regret analysis of randomized MBRL with function approximation, extending theory to generalized settings and proposing concrete reward randomization methods.
Findings
Reward randomization guarantees partial optimism under KNR models.
It yields near-optimal worst-case regret in interaction count.
Conditions for effective reward randomization are identified and exemplified.
Abstract
Model-based Reinforcement Learning (MBRL) has been widely adapted due to its sample efficiency. However, existing worst-case regret analysis typically requires optimistic planning, which is not realistic in general. In contrast, motivated by the theory, empirical study utilizes ensemble of models, which achieve state-of-the-art performance on various testing environments. Such deviation between theory and empirical study leads us to question whether randomized model ensemble guarantee optimism, and hence the optimal worst-case regret? This paper partially answers such question from the perspective of reward randomization, a scarcely explored direction of exploration with MBRL. We show that under the kernelized linear regulator (KNR) model, reward randomization guarantees a partial optimism, which further yields a near-optimal worst-case regret in terms of the number of interactions. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Receptor Mechanisms and Signaling · Reinforcement Learning in Robotics
