Unifying Value Iteration, Advantage Learning, and Dynamic Policy   Programming

Tadashi Kozuno; Eiji Uchibe; Kenji Doya

arXiv:1710.10866·stat.ML·October 31, 2017·2 cites

Unifying Value Iteration, Advantage Learning, and Dynamic Policy Programming

Tadashi Kozuno, Eiji Uchibe, Kenji Doya

PDF

Open Access

TL;DR

This paper introduces a new unified dynamic programming algorithm, AGVI, that combines and improves upon existing methods like value iteration, advantage learning, and dynamic policy programming, with theoretical guarantees and promising experimental results.

Contribution

The paper proposes AGVI, a generalized and robust dynamic programming algorithm unifying several existing methods, with theoretical performance guarantees and empirical validation.

Findings

01

AGVI includes performance guarantees for existing algorithms as special cases.

02

Numerical experiments support AGVI's theoretical advantages.

03

AGVI shows promise as an alternative to previous algorithms.

Abstract

Approximate dynamic programming algorithms, such as approximate value iteration, have been successfully applied to many complex reinforcement learning tasks, and a better approximate dynamic programming algorithm is expected to further extend the applicability of reinforcement learning to various tasks. In this paper we propose a new, robust dynamic programming algorithm that unifies value iteration, advantage learning, and dynamic policy programming. We call it generalized value iteration (GVI) and its approximated version, approximate GVI (AGVI). We show AGVI's performance guarantee, which includes performance guarantees for existing algorithms, as special cases. We discuss theoretical weaknesses of existing algorithms, and explain the advantages of AGVI. Numerical experiments in a simple environment support theoretical arguments, and suggest that AGVI is a promising alternative to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Auction Theory and Applications · Economic theories and models