Efficient Algorithms for Logistic Contextual Slate Bandits with Bandit Feedback
Tanmay Goyal, Gaurav Sinha

TL;DR
This paper introduces efficient algorithms for the Logistic Contextual Slate Bandit problem, achieving low regret and fast runtime, with applications to language model prompt selection.
Contribution
The paper proposes two scalable algorithms, Slate-GLM-OFU and Slate-GLM-TS, with theoretical guarantees and practical effectiveness for large-scale slate bandit problems.
Findings
Algorithms achieve $ ilde{O}( oot{T} )$ regret under diversity assumptions.
Methods outperform baselines in synthetic experiments in regret and runtime.
Application to language models shows competitive accuracy in prompt selection.
Abstract
We study the Logistic Contextual Slate Bandit problem, where, at each round, an agent selects a slate of items from an exponentially large set (of size ) of candidate slates provided by the environment. A single binary reward, determined by a logistic model, is observed for the chosen slate. Our objective is to develop algorithms that maximize cumulative reward over rounds while maintaining low per-round computational costs. We propose two algorithms, Slate-GLM-OFU and Slate-GLM-TS, that accomplish this goal. These algorithms achieve per-round time complexity via local planning (independent slot selections), and low regret through global learning (joint parameter estimation). We provide theoretical and empirical evidence supporting these claims. Under a well-studied diversity assumption, we prove that Slate-GLM-OFU incurs only …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
