Nearly optimal exploration-exploitation decision thresholds

Christos Dimitrakakis

arXiv:cs/0604010·cs.AI·June 6, 2018

Nearly optimal exploration-exploitation decision thresholds

Christos Dimitrakakis

PDF

Open Access

TL;DR

This paper derives near-optimal decision thresholds for exploration and exploitation in reinforcement learning, linking planning horizon and uncertainty, and introduces a bagging approach for efficient posterior sampling.

Contribution

It presents explicit upper bounds for action utility in multi-armed bandits, generalizes Thompson sampling, and introduces bagging via online bootstrapping for reinforcement learning.

Findings

01

Proposed decision thresholds improve exploration-exploitation balance.

02

Experimental results show competitive performance with existing algorithms.

03

Introduced an efficient online bootstrapping method for posterior sampling.

Abstract

While in general trading off exploration and exploitation in reinforcement learning is hard, under some formulations relatively simple solutions exist. In this paper, we first derive upper bounds for the utility of selecting different actions in the multi-armed bandit setting. Unlike the common statistical upper confidence bounds, these explicitly link the planning horizon, uncertainty and the need for exploration explicit. The resulting algorithm can be seen as a generalisation of the classical Thompson sampling algorithm. We experimentally test these algorithms, as well as $ϵ$ -greedy and the value of perfect information heuristics. Finally, we also introduce the idea of bagging for reinforcement learning. By employing a version of online bootstrapping, we can efficiently sample from an approximate posterior distribution.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Optimization and Search Problems