The price of unfairness in linear bandits with biased feedback
Solenne Gaucher (CELESTE), Alexandra Carpentier, Christophe Giraud, (CELESTE)

TL;DR
This paper investigates fair decision making in linear bandits with biased feedback, proposing an algorithm that corrects bias and analyzing its regret bounds, revealing regimes of varying difficulty.
Contribution
It introduces a phased elimination algorithm for biased linear bandits and establishes tight regret bounds, highlighting the impact of bias on problem difficulty.
Findings
Regret bounds depend on a geometrical constant κ_*
The worst-case regret is O(κ_*^{1/3} log(T)^{1/3} T^{2/3})
Identifies a transition in problem difficulty based on bias severity
Abstract
In this paper, we study the problem of fair sequential decision making with biased linear bandit feedback. At each round, a player selects an action described by a covariate and by a sensitive attribute. The perceived reward is a linear combination of the covariates of the chosen action, but the player only observes a biased evaluation of this reward, depending on the sensitive attribute. To characterize the difficulty of this problem, we design a phased elimination algorithm that corrects the unfair evaluations, and establish upper bounds on its regret. We show that the worst-case regret is smaller than , where is an explicit geometrical constant characterizing the difficulty of bias estimation. We prove lower bounds on the worst-case regret for some sets of actions showing that this rate is tight up to a possible…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Game Theory and Applications · Auction Theory and Applications
