
TL;DR
This course provides a rigorous introduction to dynamic programming, optimality principles, and reinforcement learning techniques, with new convergence proofs for policy gradient methods in average reward scenarios.
Contribution
It offers a rigorous proof of the principle of optimality for upper semi-continuous models and introduces new convergence results for policy gradient methods in average reward reinforcement learning.
Findings
Proof of the principle of optimality for upper semi-continuous models
Convergence proof for Q-learning algorithms
New convergence result for policy gradient methods in average reward case
Abstract
These lecture notes are derived from a graduate-level course in dynamic optimization, offering an introduction to techniques and models extensively used in management science, economics, operations research, engineering, and computer science. The course emphasizes the theoretical underpinnings of discrete-time dynamic programming models and advanced algorithmic strategies for solving these models. Unlike typical treatments, it provides a proof for the principle of optimality for upper semi-continuous dynamic programming, a middle ground between the simpler countable state space case \cite{bertsekas2012dynamic}, and the involved universally measurable case \cite{bertsekas1996stochastic}. This approach is sufficiently rigorous to include important examples such as dynamic pricing, consumption-savings, and inventory management models. The course also delves into the properties of value and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEducational Technology and Optimization · Engineering Education and Pedagogy · Complex Systems and Decision Making
