Infomax strategies for an optimal balance between exploration and   exploitation

Gautam Reddy; Antonio Celani; Massimo Vergassola

arXiv:1601.03073·cs.LG·May 25, 2016

Infomax strategies for an optimal balance between exploration and exploitation

Gautam Reddy, Antonio Celani, Massimo Vergassola

PDF

TL;DR

This paper demonstrates that an Infomax strategy, Info-p, effectively balances exploration and exploitation in multi-armed bandit problems, achieving optimal bounds by focusing on information about the highest mean reward.

Contribution

The study introduces and validates an Infomax-based policy, Info-p, that optimally balances exploration and exploitation in multi-armed bandit scenarios, outperforming existing methods.

Findings

01

Info-p saturates known optimal bounds

02

Info-p compares favorably to existing policies

03

Focus on highest mean reward enables optimal tradeoffs

Abstract

Proper balance between exploitation and exploration is what makes good decisions, which achieve high rewards like payoff or evolutionary fitness. The Infomax principle postulates that maximization of information directs the function of diverse systems, from living systems to artificial neural networks. While specific applications are successful, the validity of information as a proxy for reward remains unclear. Here, we consider the multi-armed bandit decision problem, which features arms (slot-machines) of unknown probabilities of success and a player trying to maximize cumulative payoff by choosing the sequence of arms to play. We show that an Infomax strategy (Info-p) which optimally gathers information on the highest mean reward among the arms saturates known optimal bounds and compares favorably to existing policies. The highest mean reward considered by Info-p is not the quantity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.