Lagrangian Duality in Reinforcement Learning

Pranay Pasula

arXiv:2007.09998·cs.LG·July 28, 2020

Lagrangian Duality in Reinforcement Learning

Pranay Pasula

PDF

Open Access

TL;DR

This paper explores the role of duality in reinforcement learning, highlighting its presence in classical and recent methods, and showing how it facilitates solving complex RL problems through convex optimization techniques.

Contribution

It demonstrates the widespread involvement of duality in RL, connecting classical algorithms with modern approaches and emphasizing its importance in problem tractability.

Findings

01

Duality appears in value iteration and dynamic programming.

02

Modern RL methods like TRPO, A3C, and GAIL involve duality concepts.

03

Duality helps transform intractable RL problems into convex programs.

Abstract

Although duality is used extensively in certain fields, such as supervised learning in machine learning, it has been much less explored in others, such as reinforcement learning (RL). In this paper, we show how duality is involved in a variety of RL work, from that which spearheaded the field, such as Richard Bellman's value iteration, to that which was done within just the past few years yet has already had significant impact, such as TRPO, A3C, and GAIL. We show that duality is not uncommon in reinforcement learning, especially when value iteration, or dynamic programming, is used or when first or second order approximations are made to transform initially intractable problems into tractable convex programs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Evolutionary Algorithms and Applications

MethodsGenerative Adversarial Imitation Learning · Dense Connections · Convolution · Entropy Regularization · Softmax · Trust Region Policy Optimization · A3C