Scalable Exploration via Ensemble++
Yingru Li, Jiawei Xu, Baoxiang Wang, Zhi-Quan Luo

TL;DR
Ensemble++ introduces a scalable ensemble-based exploration method for bandit problems, achieving near-optimal regret with significantly fewer ensemble members, and extends to nonlinear rewards with neural features.
Contribution
It proposes a novel shared-factor ensemble architecture with random linear combinations, providing theoretical guarantees and practical extensions to nonlinear rewards.
Findings
Achieves regret comparable to exact Thompson Sampling with Θ(d log T) ensemble size.
Performs well across linear, quadratic, neural, and GPT-based bandits.
Outperforms state-of-the-art methods in regret-computation tradeoff.
Abstract
Thompson Sampling is a principled method for balancing exploration and exploitation, but its real-world adoption faces computational challenges in large-scale or non-conjugate settings. While ensemble-based approaches offer partial remedies, they typically require prohibitively large ensemble sizes. We propose Ensemble++, a scalable exploration framework using a novel shared-factor ensemble architecture with random linear combinations. For linear bandits, we provide theoretical guarantees showing that Ensemble++ achieves regret comparable to exact Thompson Sampling with only ensemble sizes--significantly outperforming prior methods. Crucially, this efficiency holds across both compact and finite action sets with either time-invariant or time-varying contexts without configuration changes. We extend this theoretical foundation to nonlinear rewards by replacing fixed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Parallel Computing and Optimization Techniques · Simulation Techniques and Applications
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Balanced Selection · Byte Pair Encoding · Cosine Annealing · Layer Normalization · Linear Layer · Weight Decay · Softmax · Multi-Head Attention
