Lagrangian Duality in Reinforcement Learning
Pranay Pasula

TL;DR
This paper explores the role of duality in reinforcement learning, highlighting its presence in classical and recent methods, and showing how it facilitates solving complex RL problems through convex optimization techniques.
Contribution
It demonstrates the widespread involvement of duality in RL, connecting classical algorithms with modern approaches and emphasizing its importance in problem tractability.
Findings
Duality appears in value iteration and dynamic programming.
Modern RL methods like TRPO, A3C, and GAIL involve duality concepts.
Duality helps transform intractable RL problems into convex programs.
Abstract
Although duality is used extensively in certain fields, such as supervised learning in machine learning, it has been much less explored in others, such as reinforcement learning (RL). In this paper, we show how duality is involved in a variety of RL work, from that which spearheaded the field, such as Richard Bellman's value iteration, to that which was done within just the past few years yet has already had significant impact, such as TRPO, A3C, and GAIL. We show that duality is not uncommon in reinforcement learning, especially when value iteration, or dynamic programming, is used or when first or second order approximations are made to transform initially intractable problems into tractable convex programs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Evolutionary Algorithms and Applications
MethodsGenerative Adversarial Imitation Learning · Dense Connections · Convolution · Entropy Regularization · Softmax · Trust Region Policy Optimization · A3C
