Scale-free adaptive planning for deterministic dynamics & discounted rewards

Peter L. Bartlett; Victor Gabillon; Jennifer Healey; Michal Valko

arXiv:2604.18312·cs.LG·April 21, 2026

Scale-free adaptive planning for deterministic dynamics & discounted rewards

Peter L. Bartlett, Victor Gabillon, Jennifer Healey, Michal Valko

PDF

1 Video

TL;DR

The paper introduces Platypoos, a scale-free planning algorithm for deterministic environments with stochastic, unbounded rewards, providing optimal sample complexity analysis across various reward scales and discount factors.

Contribution

It presents a novel scale-free planning algorithm, Platypoos, with a comprehensive analysis that adapts to unknown reward scales and discount factors, improving upon prior methods.

Findings

01

Platypoos achieves optimal sample complexity bounds.

02

The analysis holds uniformly over a broad range of discount factors.

03

The lower bound matches the upper bound, confirming optimality.

Abstract

We address the problem of planning in an environment with deterministic dynamics and stochastic rewards with discounted returns. The optimal value function is not known, nor are the rewards bounded. We propose Platypoos, a simple scale-free planning algorithm that adapts to the unknown scale and smoothness of the reward function. We provide a sample complexity analysis for Platypoos that improves upon prior work and holds simultaneously over a broad range of discount factors and reward scales, without the algorithm knowing them. We also establish a matching lower bound showing our analysis is optimal up to constants.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Scale-free adaptive planning for deterministic dynamics & discounted rewards· slideslive