Smooth Sequential Optimisation with Delayed Feedback
Srivas Chennu, Jamie Martin, Puli Liyanagama, Phil Mohr

TL;DR
This paper introduces a novel reward estimation method for bandit algorithms that uses windowed cumulative inputs to handle delayed feedback, significantly improving stability and accuracy in sequential optimization tasks.
Contribution
It proposes a new adaptive shrinkage technique that estimates smoothed rewards from incomplete data, enhancing stability and performance in delayed feedback scenarios.
Findings
Retains benefits of Bayesian shrinkage while improving stability by over 50%.
Reduces variability in treatment allocations by up to 3.8 times.
Achieves up to 8% increase in true positive rates and 37% decrease in false positives.
Abstract
Stochastic delays in feedback lead to unstable sequential learning using multi-armed bandits. Recently, empirical Bayesian shrinkage has been shown to improve reward estimation in bandit learning. Here, we propose a novel adaptation to shrinkage that estimates smoothed reward estimates from windowed cumulative inputs, to deal with incomplete knowledge from delayed feedback and non-stationary rewards. Using numerical simulations, we show that this adaptation retains the benefits of shrinkage, and improves the stability of reward estimation by more than 50%. Our proposal reduces variability in treatment allocations to the best arm by up to 3.8x, and improves statistical accuracy - with up to 8% improvement in true positive rates and 37% reduction in false positive rates. Together, these advantages enable control of the trade-off between speed and stability of adaptation, and facilitate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Gaussian Processes and Bayesian Inference
