A View of the Certainty-Equivalence Method for PAC RL as an Application   of the Trajectory Tree Method

Shivaram Kalyanakrishnan; Sheel Shah; Santhosh Kumar Guguloth

arXiv:2501.02652·cs.LG·February 24, 2025

A View of the Certainty-Equivalence Method for PAC RL as an Application of the Trajectory Tree Method

Shivaram Kalyanakrishnan, Sheel Shah, Santhosh Kumar Guguloth

PDF

Open Access

TL;DR

This paper reveals that the certainty-equivalence method in PAC reinforcement learning can be viewed as an application of the trajectory tree method, leading to simpler proofs and improved sample complexity bounds.

Contribution

It establishes a novel connection between CEM and TTM, providing new proofs and tighter bounds for sample complexity in PAC RL under weaker assumptions.

Findings

01

New proofs of sample complexity bounds for CEM

02

Improved bounds for non-stationary and stationary MDPs

03

Lower bound showing minimax-optimality in small-error regime

Abstract

Reinforcement learning (RL) enables an agent interacting with an unknown MDP $M$ to optimise its behaviour by observing transitions sampled from $M$ . A natural entity that emerges in the agent's reasoning is $M$ , the maximum likelihood estimate of $M$ based on the observed transitions. The well-known \textit{certainty-equivalence} method (CEM) dictates that the agent update its behaviour to $π$ , which is an optimal policy for $M$ . Not only is CEM intuitive, it has been shown to enjoy minimax-optimal sample complexity in some regions of the parameter space for PAC RL with a generative model~\citep{Agarwal2020GenModel}. A seemingly unrelated algorithm is the ``trajectory tree method'' (TTM)~\citep{Kearns+MN:1999}, originally developed for efficient decision-time planning in large POMDPs. This paper presents a theoretical investigation that stems from the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEvolutionary Algorithms and Applications · Fault Detection and Control Systems · Reinforcement Learning in Robotics