Rising Rested MAB with Linear Drift
Omer Amichay, Yishay Mansour

TL;DR
This paper analyzes a non-stationary multi-armed bandit problem where rewards follow a linear drift, providing tight regret bounds and extending results to instance-dependent cases.
Contribution
It introduces tight regret bounds for a linear drift MAB model and extends to instance-dependent regret analysis, advancing understanding of non-stationary bandit problems.
Findings
Tight regret bounds of rac{4}{5}T^{4/5}K^{3/5} established
Extension to instance-dependent regret bounds based on unknown parameters
Provides both upper and lower bounds for the regret in linear drift MABs
Abstract
We consider non-stationary multi-arm bandit (MAB) where the expected reward of each action follows a linear function of the number of times we executed the action. Our main result is a tight regret bound of , by providing both upper and lower bounds. We extend our results to derive instance dependent regret bounds, which depend on the unknown parametrization of the linear drift of the rewards.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Surface Polishing Techniques · Digital Image Processing Techniques · CCD and CMOS Imaging Sensors
