Regret Analysis in Deterministic Reinforcement Learning
Damianos Tranos, Alexandre Proutiere

TL;DR
This paper derives fundamental logarithmic regret lower bounds for deterministic Markov Decision Processes, highlighting the limits of learning algorithms and analyzing specific cases like line search and state-dependent rewards.
Contribution
It introduces problem-specific regret bounds for deterministic MDPs, explicitly depending on system parameters, and analyzes their implications through graph-based and example scenarios.
Findings
Regret bounds depend explicitly on system parameters.
Deterministic MDPs can be analyzed via their cycle structures.
Navigation in deterministic MDPs may not affect learning performance.
Abstract
We consider Markov Decision Processes (MDPs) with deterministic transitions and study the problem of regret minimization, which is central to the analysis and design of optimal learning algorithms. We present logarithmic problem-specific regret lower bounds that explicitly depend on the system parameter (in contrast to previous minimax approaches) and thus, truly quantify the fundamental limit of performance achievable by any learning algorithm. Deterministic MDPs can be interpreted as graphs and analyzed in terms of their cycles, a fact which we leverage in order to identify a class of deterministic MDPs whose regret lower bound can be determined numerically. We further exemplify this result on a deterministic line search problem, and a deterministic MDP with state-dependent rewards, whose regret lower bounds we can state explicitly. These bounds share similarities with the known…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
