TL;DR
The paper introduces Platypoos, a scale-free planning algorithm for deterministic environments with stochastic, unbounded rewards, providing optimal sample complexity analysis across various reward scales and discount factors.
Contribution
It presents a novel scale-free planning algorithm, Platypoos, with a comprehensive analysis that adapts to unknown reward scales and discount factors, improving upon prior methods.
Findings
Platypoos achieves optimal sample complexity bounds.
The analysis holds uniformly over a broad range of discount factors.
The lower bound matches the upper bound, confirming optimality.
Abstract
We address the problem of planning in an environment with deterministic dynamics and stochastic rewards with discounted returns. The optimal value function is not known, nor are the rewards bounded. We propose Platypoos, a simple scale-free planning algorithm that adapts to the unknown scale and smoothness of the reward function. We provide a sample complexity analysis for Platypoos that improves upon prior work and holds simultaneously over a broad range of discount factors and reward scales, without the algorithm knowing them. We also establish a matching lower bound showing our analysis is optimal up to constants.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
