Regret Bounds and Reinforcement Learning Exploration of EXP-based   Algorithms

Mengfan Xu; Diego Klabjan

arXiv:2009.09538·cs.LG·May 7, 2024·1 cites

Regret Bounds and Reinforcement Learning Exploration of EXP-based Algorithms

Mengfan Xu, Diego Klabjan

PDF

Open Access

TL;DR

This paper introduces new EXP-based algorithms for exploration in reinforcement learning and bandits with unbounded rewards, providing theoretical regret bounds and demonstrating improved exploration in complex environments.

Contribution

It proposes EXP4.P and EXP4-RL algorithms that handle unbounded rewards and extend exploration strategies to reinforcement learning, with theoretical guarantees and empirical validation.

Findings

01

EXP4.P achieves regret bounds in unbounded bandits.

02

Including a competent expert leads to global optimality in linear bandits.

03

EXP4-RL outperforms existing methods in complex exploration tasks.

Abstract

We study the challenging exploration incentive problem in both bandit and reinforcement learning, where the rewards are scale-free and potentially unbounded, driven by real-world scenarios and differing from existing work. Past works in reinforcement learning either assume costly interactions with an environment or propose algorithms finding potentially low quality local maxima. Motivated by EXP-type methods that integrate multiple agents (experts) for exploration in bandits with the assumption that rewards are bounded, we propose new algorithms, namely EXP4.P and EXP4-RL for exploration in the unbounded reward case, and demonstrate their effectiveness in these new settings. Unbounded rewards introduce challenges as the regret cannot be limited by the number of trials, and selecting suboptimal arms may lead to infinite regret. Specifically, we establish EXP4.P's regret upper bounds in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Artificial Intelligence in Games