TL;DR
This paper introduces INFEX, a practical framework for infrequent exploration in linear bandits, achieving near-optimal regret with less frequent exploration, suitable for safety-critical or costly applications.
Contribution
INFEX is a simple, modular framework that enables infrequent exploration in linear bandits, matching standard regret bounds and improving computational efficiency.
Findings
INFEX achieves instance-dependent regret comparable to fully adaptive methods.
INFEX allows integration of any exploration strategy, enhancing flexibility.
Empirical results show state-of-the-art regret and runtime improvements.
Abstract
We study the problem of infrequent exploration in linear bandits, addressing a significant yet overlooked gap between fully adaptive exploratory methods (e.g., UCB and Thompson Sampling), which explore potentially at every time step, and purely greedy approaches, which require stringent diversity assumptions to succeed. Continuous exploration can be impractical or unethical in safety-critical or costly domains, while purely greedy strategies typically fail without adequate contextual diversity. To bridge these extremes, we introduce a simple and practical framework, INFEX, explicitly designed for infrequent exploration. INFEX executes a base exploratory policy according to a given schedule while predominantly choosing greedy actions in between. Despite its simplicity, our theoretical analysis demonstrates that INFEX achieves instance-dependent regret matching standard provably efficient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
