Empirical Dynamic Programming
William B. Haskell, Rahul Jain, Dileep Kalathil

TL;DR
This paper introduces empirical dynamic programming algorithms for MDPs that replace expectations with empirical estimates, providing convergence analysis, sample complexity bounds, and demonstrating faster convergence than stochastic approximation methods.
Contribution
The paper develops empirical dynamic programming algorithms with convergence guarantees and sample complexity bounds, extending to asynchronous variants and specific applications like the newsvendor problem.
Findings
Faster convergence rate than stochastic approximation algorithms.
Provides probabilistic fixed points and convergence analysis for empirical operators.
Extends methods to minimax problems and practical applications.
Abstract
We propose empirical dynamic programming algorithms for Markov decision processes (MDPs). In these algorithms, the exact expectation in the Bellman operator in classical value iteration is replaced by an empirical estimate to get `empirical value iteration' (EVI). Policy evaluation and policy improvement in classical policy iteration are also replaced by simulation to get `empirical policy iteration' (EPI). Thus, these empirical dynamic programming algorithms involve iteration of a random operator, the empirical Bellman operator. We introduce notions of probabilistic fixed points for such random monotone operators. We develop a stochastic dominance framework for convergence analysis of such operators. We then use this to give sample complexity bounds for both EVI and EPI. We then provide various variations and extensions to asynchronous empirical dynamic programming, the minimax…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Optimization and Search Problems · Advanced Bandit Algorithms Research
