Bandits with Delayed, Aggregated Anonymous Feedback
Ciara Pike-Burke, Shipra Agrawal, Csaba Szepesvari, Steffen, Grunewalder

TL;DR
This paper investigates a complex bandit problem where rewards are delayed, aggregated, and anonymous, proposing algorithms that achieve regret bounds comparable to non-anonymous settings under certain delay conditions.
Contribution
It introduces algorithms for bandits with delayed, aggregated anonymous feedback that match non-anonymous regret bounds when delays are bounded or known.
Findings
Regret increases only additively with expected delay in the anonymous setting.
Algorithms match non-anonymous regret bounds under bounded delays.
Performance degrades gracefully with unbounded delays, up to logarithmic factors.
Abstract
We study a variant of the stochastic -armed bandit problem, which we call "bandits with delayed, aggregated anonymous feedback". In this problem, when the player pulls an arm, a reward is generated, however it is not immediately observed. Instead, at the end of each round the player observes only the sum of a number of previously generated rewards which happen to arrive in the given round. The rewards are stochastically delayed and due to the aggregated nature of the observations, the information of which arm led to a particular reward is lost. The question is what is the cost of the information loss due to this delayed, aggregated anonymous feedback? Previous works have studied bandits with stochastic, non-anonymous delays and found that the regret increases only by an additive factor relating to the expected delay. In this paper, we show that this additive regret increase can be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Game Theory and Applications
