A View of the Certainty-Equivalence Method for PAC RL as an Application of the Trajectory Tree Method
Shivaram Kalyanakrishnan, Sheel Shah, Santhosh Kumar Guguloth

TL;DR
This paper reveals that the certainty-equivalence method in PAC reinforcement learning can be viewed as an application of the trajectory tree method, leading to simpler proofs and improved sample complexity bounds.
Contribution
It establishes a novel connection between CEM and TTM, providing new proofs and tighter bounds for sample complexity in PAC RL under weaker assumptions.
Findings
New proofs of sample complexity bounds for CEM
Improved bounds for non-stationary and stationary MDPs
Lower bound showing minimax-optimality in small-error regime
Abstract
Reinforcement learning (RL) enables an agent interacting with an unknown MDP to optimise its behaviour by observing transitions sampled from . A natural entity that emerges in the agent's reasoning is , the maximum likelihood estimate of based on the observed transitions. The well-known \textit{certainty-equivalence} method (CEM) dictates that the agent update its behaviour to , which is an optimal policy for . Not only is CEM intuitive, it has been shown to enjoy minimax-optimal sample complexity in some regions of the parameter space for PAC RL with a generative model~\citep{Agarwal2020GenModel}. A seemingly unrelated algorithm is the ``trajectory tree method'' (TTM)~\citep{Kearns+MN:1999}, originally developed for efficient decision-time planning in large POMDPs. This paper presents a theoretical investigation that stems from the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolutionary Algorithms and Applications · Fault Detection and Control Systems · Reinforcement Learning in Robotics
