Improving Reinforcement Learning Sample-Efficiency using Local Approximation
Mohit Prashant, Arvind Easwaran

TL;DR
This paper introduces a novel approach to improve reinforcement learning sample-efficiency by leveraging local approximations of the MDP, resulting in sharper PAC bounds and a logarithmic reduction in sample complexity.
Contribution
The study derives sharper PAC bounds for RL, introduces a local approximation method to reduce sample complexity, and extends the approach to model-free settings with experimental validation.
Findings
Sample complexity reduced to O(SA log A)
Sharper PAC bounds than existing literature
Significant empirical improvements demonstrated
Abstract
In this study, we derive Probably Approximately Correct (PAC) bounds on the asymptotic sample-complexity for RL within the infinite-horizon Markov Decision Process (MDP) setting that are sharper than those in existing literature. The premise of our study is twofold: firstly, the further two states are from each other, transition-wise, the less relevant the value of the first state is when learning the -optimal value of the second; secondly, the amount of 'effort', sample-complexity-wise, expended in learning the -optimal value of a state is independent of the number of samples required to learn the -optimal value of a second state that is a sufficient number of transitions away from the first. Inversely, states within each other's vicinity have values that are dependent on each other and will require a similar number of samples to learn. By approximating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolutionary Algorithms and Applications · Reinforcement Learning in Robotics · Metaheuristic Optimization Algorithms Research
