
TL;DR
This paper introduces efficient implementations of the EXP3 algorithm that operate in constant time per round, along with new algorithms that balance regret bounds and computational efficiency.
Contribution
It presents practical, constant-time EXP3 variants and analyzes their regret-time trade-offs, improving efficiency over traditional methods.
Findings
EXP3 can be implemented in constant time per round.
New algorithms offer better trade-offs between regret and computational complexity.
The proposed methods are more practical for large-scale online learning.
Abstract
We point out that EXP3 can be implemented in constant time per round, propose more practical algorithms, and analyze the trade-offs between the regret bounds and time complexities of these algorithms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Risk and Portfolio Optimization · Constraint Satisfaction and Optimization
