Infrequent Exploration in Linear Bandits

Harin Lee; Min-hwan Oh

arXiv:2510.26000·cs.LG·October 31, 2025

Infrequent Exploration in Linear Bandits

Harin Lee, Min-hwan Oh

PDF

1 Video

TL;DR

This paper introduces INFEX, a practical framework for infrequent exploration in linear bandits, achieving near-optimal regret with less frequent exploration, suitable for safety-critical or costly applications.

Contribution

INFEX is a simple, modular framework that enables infrequent exploration in linear bandits, matching standard regret bounds and improving computational efficiency.

Findings

01

INFEX achieves instance-dependent regret comparable to fully adaptive methods.

02

INFEX allows integration of any exploration strategy, enhancing flexibility.

03

Empirical results show state-of-the-art regret and runtime improvements.

Abstract

We study the problem of infrequent exploration in linear bandits, addressing a significant yet overlooked gap between fully adaptive exploratory methods (e.g., UCB and Thompson Sampling), which explore potentially at every time step, and purely greedy approaches, which require stringent diversity assumptions to succeed. Continuous exploration can be impractical or unethical in safety-critical or costly domains, while purely greedy strategies typically fail without adequate contextual diversity. To bridge these extremes, we introduce a simple and practical framework, INFEX, explicitly designed for infrequent exploration. INFEX executes a base exploratory policy according to a given schedule while predominantly choosing greedy actions in between. Despite its simplicity, our theoretical analysis demonstrates that INFEX achieves instance-dependent regret matching standard provably efficient…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Infrequent Exploration in Linear Bandits· slideslive