Mitigating Disparity while Maximizing Reward: Tight Anytime Guarantee for Improving Bandits
Vishakha Patil, Vineet Nair, Ganesh Ghalme, Arindam Khan

TL;DR
This paper introduces an anytime algorithm for the Improving Multi-Armed Bandit problem that balances maximizing reward and ensuring fair opportunity distribution, with proven optimality in a horizon-unaware setting.
Contribution
The paper presents the first anytime algorithm for IMAB that achieves optimal reward while mitigating disparity, with proven bounds on regret and competitive ratio.
Findings
Algorithm achieves optimal cumulative reward in IMAB.
Mitigates initial disparity among arms effectively.
Proven bounds on regret and competitive ratio.
Abstract
We study the Improving Multi-Armed Bandit (IMAB) problem, where the reward obtained from an arm increases with the number of pulls it receives. This model provides an elegant abstraction for many real-world problems in domains such as education and employment, where decisions about the distribution of opportunities can affect the future capabilities of communities and the disparity between them. A decision-maker in such settings must consider the impact of her decisions on future rewards in addition to the standard objective of maximizing her cumulative reward at any time. In many of these applications, the time horizon is unknown to the decision-maker beforehand, which motivates the study of the IMAB problem in the technically more challenging horizon-unaware setting. We study the tension that arises between two seemingly conflicting objectives in the horizon-unaware setting: a)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Auction Theory and Applications
