Meta-Learning of Exploration/Exploitation Strategies: The Multi-Armed   Bandit Case

Francis Maes; Damien Ernst; Louis Wehenkel

arXiv:1207.5208·cs.AI·July 24, 2012

Meta-Learning of Exploration/Exploitation Strategies: The Multi-Armed Bandit Case

Francis Maes, Damien Ernst, Louis Wehenkel

PDF

TL;DR

This paper introduces a meta-learning framework for exploration/exploitation strategies in multi-armed bandit problems, leveraging prior knowledge to optimize strategies tailored to specific problem classes, outperforming standard methods.

Contribution

It proposes a systematic approach to incorporate prior knowledge into E/E strategies via modeling, hypothesis spaces, and optimization, with implementations for parameterized and symbolic strategies.

Findings

01

Meta-learned strategies outperform standard algorithms like UCB variants.

02

Strategies show robustness across different reward distributions.

03

Optimization algorithms effectively tailor strategies to specific problem classes.

Abstract

The exploration/exploitation (E/E) dilemma arises naturally in many subfields of Science. Multi-armed bandit problems formalize this dilemma in its canonical form. Most current research in this field focuses on generic solutions that can be applied to a wide range of problems. However, in practice, it is often the case that a form of prior information is available about the specific class of target problems. Prior knowledge is rarely used in current solutions due to the lack of a systematic approach to incorporate it into the E/E strategy. To address a specific class of E/E problems, we propose to proceed in three steps: (i) model prior knowledge in the form of a probability distribution over the target class of E/E problems; (ii) choose a large hypothesis space of candidate E/E strategies; and (iii), solve an optimization problem to find a candidate E/E strategy of maximal average…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.