Scalable Exploration via Ensemble++

Yingru Li; Jiawei Xu; Baoxiang Wang; Zhi-Quan Luo

arXiv:2407.13195·cs.LG·October 29, 2025

Scalable Exploration via Ensemble++

Yingru Li, Jiawei Xu, Baoxiang Wang, Zhi-Quan Luo

PDF

Open Access 2 Repos

TL;DR

Ensemble++ introduces a scalable ensemble-based exploration method for bandit problems, achieving near-optimal regret with significantly fewer ensemble members, and extends to nonlinear rewards with neural features.

Contribution

It proposes a novel shared-factor ensemble architecture with random linear combinations, providing theoretical guarantees and practical extensions to nonlinear rewards.

Findings

01

Achieves regret comparable to exact Thompson Sampling with Θ(d log T) ensemble size.

02

Performs well across linear, quadratic, neural, and GPT-based bandits.

03

Outperforms state-of-the-art methods in regret-computation tradeoff.

Abstract

Thompson Sampling is a principled method for balancing exploration and exploitation, but its real-world adoption faces computational challenges in large-scale or non-conjugate settings. While ensemble-based approaches offer partial remedies, they typically require prohibitively large ensemble sizes. We propose Ensemble++, a scalable exploration framework using a novel shared-factor ensemble architecture with random linear combinations. For linear bandits, we provide theoretical guarantees showing that Ensemble++ achieves regret comparable to exact Thompson Sampling with only $Θ (d lo g T)$ ensemble sizes--significantly outperforming prior methods. Crucially, this efficiency holds across both compact and finite action sets with either time-invariant or time-varying contexts without configuration changes. We extend this theoretical foundation to nonlinear rewards by replacing fixed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed and Parallel Computing Systems · Parallel Computing and Optimization Techniques · Simulation Techniques and Applications

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Balanced Selection · Byte Pair Encoding · Cosine Annealing · Layer Normalization · Linear Layer · Weight Decay · Softmax · Multi-Head Attention