Combinatorial Bandits under Strategic Manipulations
Jing Dong, Ke Li, Shuai Li, Baoxiang Wang

TL;DR
This paper studies combinatorial multi-armed bandits under strategic reward manipulations, proposing a robust algorithm with bounded regret, and validates its effectiveness through extensive experiments on real and synthetic data.
Contribution
It introduces a strategic variant of the combinatorial UCB algorithm that accounts for reward manipulations and provides theoretical regret bounds and lower bounds on manipulation budgets.
Findings
The proposed algorithm achieves regret of O(m log T + m B_{max}) under strategic manipulations.
Lower bounds on arms' budgets necessary to induce certain regret levels.
Experimental results confirm robustness and regret bounds across various manipulation regimes.
Abstract
Strategic behavior against sequential learning methods, such as "click framing" in real recommendation systems, have been widely observed. Motivated by such behavior we study the problem of combinatorial multi-armed bandits (CMAB) under strategic manipulations of rewards, where each arm can modify the emitted reward signals for its own interest. This characterization of the adversarial behavior is a relaxation of previously well-studied settings such as adversarial attacks and adversarial corruption. We propose a strategic variant of the combinatorial UCB algorithm, which has a regret of at most under strategic manipulations, where is the time horizon, is the number of arms, and is the maximum budget of an arm. We provide lower bounds on the budget for arms to incur certain regret of the bandit algorithm. Extensive experiments on online worker…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Mobile Crowdsensing and Crowdsourcing · Optimization and Search Problems
