Achieving Limited Adaptivity for Multinomial Logistic Bandits
Sukruta Prakash Midigeshi, Tanmay Goyal, Gaurav Sinha

TL;DR
This paper introduces two algorithms for multinomial logistic bandits that operate with limited policy updates, achieving near-optimal regret and demonstrating practical efficiency in both stochastic and adversarial contexts.
Contribution
The paper proposes B-MNL-CB and RS-MNL algorithms that function with limited adaptivity, extending optimal design concepts and achieving low regret with few policy updates.
Findings
Algorithms achieve () ilde{O}(\u00f7b7f7) regret with limited updates
B-MNL-CB extends distributional optimal designs to multinomial settings
RS-MNL performs well with adversarial contexts and few updates
Abstract
Multinomial Logistic Bandits have recently attracted much attention due to their ability to model problems with multiple outcomes. In this setting, each decision is associated with many possible outcomes, modeled using a multinomial logit function. Several recent works on multinomial logistic bandits have simultaneously achieved optimal regret and computational efficiency. However, motivated by real-world challenges and practicality, there is a need to develop algorithms with limited adaptivity, wherein we are allowed only policy updates. To address these challenges, we present two algorithms, B-MNL-CB and RS-MNL, that operate in the batched and rarely-switching paradigms, respectively. The batched setting involves choosing the policy update rounds at the start of the algorithm, while the rarely-switching setting can choose these policy update rounds in an adaptive fashion.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
