Approximate information for efficient exploration-exploitation   strategies

Alex Barbier-Chebbah (IP; CNRS; UPCit\'e); Christian L. Vestergaard; (IP; CNRS; UPCit\'e); Jean-Baptiste Masson (IP; CNRS; UPCit\'e)

arXiv:2307.01563·stat.ML·July 6, 2023

Approximate information for efficient exploration-exploitation strategies

Alex Barbier-Chebbah (IP, CNRS, UPCit\'e), Christian L. Vestergaard, (IP, CNRS, UPCit\'e), Jean-Baptiste Masson (IP, CNRS, UPCit\'e)

PDF

Open Access

TL;DR

This paper introduces AIM, a new algorithm for multi-armed bandit problems that approximates information gain to improve exploration efficiency, matching existing methods while being faster and more deterministic.

Contribution

The paper presents AIM, a novel approximate information maximization algorithm that enhances computational efficiency and robustness in exploration-exploitation tasks.

Findings

01

AIM matches Infomax and Thompson sampling performance.

02

AIM is faster and more deterministic than existing methods.

03

Empirical results show AIM complies with the Lai-Robbins bound.

Abstract

This paper addresses the exploration-exploitation dilemma inherent in decision-making, focusing on multi-armed bandit problems. The problems involve an agent deciding whether to exploit current knowledge for immediate gains or explore new avenues for potential long-term rewards. We here introduce a novel algorithm, approximate information maximization (AIM), which employs an analytical approximation of the entropy gradient to choose which arm to pull at each point in time. AIM matches the performance of Infomax and Thompson sampling while also offering enhanced computational speed, determinism, and tractability. Empirical evaluation of AIM indicates its compliance with the Lai-Robbins asymptotic bound and demonstrates its robustness for a range of priors. Its expression is tunable, which allows for specific optimization in various settings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms