Invariance-Based Dynamic Regret Minimization
Margherita Lazzaretto, Jonas Peters, Niklas Pfister

TL;DR
This paper introduces ISD-linUCB, an algorithm for non-stationary linear bandits that leverages invariance in reward models to adapt to changes and improve regret, especially with abundant historical data.
Contribution
The paper proposes a novel invariance-based approach for non-stationary linear bandits, enabling better utilization of past data and reducing regret in dynamic environments.
Findings
Invariance reduces problem dimensionality and improves regret.
ISD-linUCB outperforms existing methods in fast-changing settings.
Leveraging historical data enhances online learning performance.
Abstract
We consider stochastic non-stationary linear bandits where the linear parameter connecting contexts to the reward changes over time. Existing algorithms in this setting localize the policy by gradually discarding or down-weighting past data, effectively shrinking the time horizon over which learning can occur. However, in many settings historical data may still carry partial information about the reward model. We propose to leverage such data while adapting to changes, by assuming the reward model decomposes into stationary and non-stationary components. Based on this assumption, we introduce ISD-linUCB, an algorithm that uses past data to learn invariances in the reward model and subsequently exploits them to improve online performance. We show both theoretically and empirically that leveraging invariance reduces the problem dimensionality, yielding significant regret improvements in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Recommender Systems and Techniques
