Provably Efficient and Agile Randomized Q-Learning
He Wang, Xingyu Xu, Yuejie Chi

TL;DR
This paper introduces RandomizedQ, a new model-free RL algorithm that combines sampling-based exploration with agile updates, achieving provably efficient regret bounds and superior empirical performance.
Contribution
The paper proposes a novel Q-learning variant, RandomizedQ, with theoretical regret guarantees and improved responsiveness over existing methods.
Findings
Achieves $ ilde{O}( ext{sqrt}(H^5SAT))$ regret bound.
Demonstrates superior empirical performance on benchmarks.
Provides a logarithmic regret bound under mild conditions.
Abstract
While Bayesian-based exploration often demonstrates superior empirical performance compared to bonus-based methods in model-based reinforcement learning (RL), its theoretical understanding remains limited for model-free settings. Existing provable algorithms either suffer from computational intractability or rely on stage-wise policy updates which reduce responsiveness and slow down the learning process. In this paper, we propose a novel variant of Q-learning algorithm, refereed to as RandomizedQ, which integrates sampling-based exploration with agile, step-wise, policy updates, for episodic tabular RL. We establish an regret bound, where is the number of states, is the number of actions, is the episode length, and is the total number of episodes. In addition, we present a logarithmic regret bound under a mild positive sub-optimality…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Gaussian Processes and Bayesian Inference
