Unifying Value Iteration, Advantage Learning, and Dynamic Policy Programming
Tadashi Kozuno, Eiji Uchibe, Kenji Doya

TL;DR
This paper introduces a new unified dynamic programming algorithm, AGVI, that combines and improves upon existing methods like value iteration, advantage learning, and dynamic policy programming, with theoretical guarantees and promising experimental results.
Contribution
The paper proposes AGVI, a generalized and robust dynamic programming algorithm unifying several existing methods, with theoretical performance guarantees and empirical validation.
Findings
AGVI includes performance guarantees for existing algorithms as special cases.
Numerical experiments support AGVI's theoretical advantages.
AGVI shows promise as an alternative to previous algorithms.
Abstract
Approximate dynamic programming algorithms, such as approximate value iteration, have been successfully applied to many complex reinforcement learning tasks, and a better approximate dynamic programming algorithm is expected to further extend the applicability of reinforcement learning to various tasks. In this paper we propose a new, robust dynamic programming algorithm that unifies value iteration, advantage learning, and dynamic policy programming. We call it generalized value iteration (GVI) and its approximated version, approximate GVI (AGVI). We show AGVI's performance guarantee, which includes performance guarantees for existing algorithms, as special cases. We discuss theoretical weaknesses of existing algorithms, and explain the advantages of AGVI. Numerical experiments in a simple environment support theoretical arguments, and suggest that AGVI is a promising alternative to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Auction Theory and Applications · Economic theories and models
