TL;DR
This paper introduces a localisation method for the eluder dimension, enabling first-order regret bounds in reinforcement learning and improving classical results for Bernoulli bandits.
Contribution
It presents a novel localisation technique for the eluder dimension, overcoming limitations of standard analysis and achieving first-order bounds in RL.
Findings
Established a lower bound on the eluder dimension for generalized linear models.
Developed a localisation method that improves regret bounds for Bernoulli bandits.
Achieved the first genuine first-order bounds for finite-horizon reinforcement learning.
Abstract
We establish a lower bound on the eluder dimension of generalised linear model classes, showing that standard eluder dimension-based analysis cannot lead to first-order regret bounds. To address this, we introduce a localisation method for the eluder dimension; our analysis immediately recovers and improves on classic results for Bernoulli bandits, and allows for the first genuine first-order bounds for finite-horizon reinforcement learning tasks with bounded cumulative returns.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
