Regret minimization in Linear Bandits with offline data via extended D-optimal exploration
Sushant Vijayan, Arun Suggala, Karthikeyan Shanmugam, Soumyabrata Pal

TL;DR
This paper introduces OOPE, an algorithm that leverages offline data with extended D-optimal design to minimize online regret in linear bandits, adapting to data quality and improving efficiency.
Contribution
The paper proposes a novel algorithm, OOPE, that effectively incorporates offline data into online regret minimization for linear bandits, with theoretical guarantees and design improvements.
Findings
Achieves regret bounds depending on offline data quality and quantity.
Provides minimax lower bounds that match the upper bounds in key regimes.
Improves design complexity from O(d^2) to O(d^2/δ_eff) in high dimensions.
Abstract
We consider the problem of online regret minimization in linear bandits with access to prior observations (offline data) from the underlying bandit model. There are numerous applications where extensive offline data is often available, such as in recommendation systems, online advertising. Consequently, this problem has been studied intensively in recent literature. Our algorithm, Offline-Online Phased Elimination (OOPE), effectively incorporates the offline data to substantially reduce the online regret compared to prior work. To leverage offline information prudently, OOPE uses an extended D-optimal design within each exploration phase. OOPE achieves an online regret is . is the effective problem dimension which measures the number of poorly explored directions in offline data and depends on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
