Approximate information for efficient exploration-exploitation strategies
Alex Barbier-Chebbah (IP, CNRS, UPCit\'e), Christian L. Vestergaard, (IP, CNRS, UPCit\'e), Jean-Baptiste Masson (IP, CNRS, UPCit\'e)

TL;DR
This paper introduces AIM, a new algorithm for multi-armed bandit problems that approximates information gain to improve exploration efficiency, matching existing methods while being faster and more deterministic.
Contribution
The paper presents AIM, a novel approximate information maximization algorithm that enhances computational efficiency and robustness in exploration-exploitation tasks.
Findings
AIM matches Infomax and Thompson sampling performance.
AIM is faster and more deterministic than existing methods.
Empirical results show AIM complies with the Lai-Robbins bound.
Abstract
This paper addresses the exploration-exploitation dilemma inherent in decision-making, focusing on multi-armed bandit problems. The problems involve an agent deciding whether to exploit current knowledge for immediate gains or explore new avenues for potential long-term rewards. We here introduce a novel algorithm, approximate information maximization (AIM), which employs an analytical approximation of the entropy gradient to choose which arm to pull at each point in time. AIM matches the performance of Infomax and Thompson sampling while also offering enhanced computational speed, determinism, and tractability. Empirical evaluation of AIM indicates its compliance with the Lai-Robbins asymptotic bound and demonstrates its robustness for a range of priors. Its expression is tunable, which allows for specific optimization in various settings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms
