A Stochastic Composite Augmented Lagrangian Method For Reinforcement   Learning

Yongfeng Li; Mingming Zhao; Weijie Chen; and Zaiwen Wen

arXiv:2105.09716·math.OC·May 21, 2021

A Stochastic Composite Augmented Lagrangian Method For Reinforcement Learning

Yongfeng Li, Mingming Zhao, Weijie Chen, and Zaiwen Wen

PDF

TL;DR

This paper introduces a novel deep reinforcement learning method based on a stochastic composite augmented Lagrangian approach, overcoming sampling challenges and ensuring convergence to optimal solutions in large or continuous environments.

Contribution

The paper proposes a deep parameterized augmented Lagrangian method that replaces intractable expectations with multipliers, enabling efficient optimization and convergence analysis.

Findings

01

Method is theoretically proven to converge to the LP's optimal solution.

02

Residuals can be made arbitrarily small with proper parameter choices.

03

Preliminary experiments show competitive performance against state-of-the-art algorithms.

Abstract

In this paper, we consider the linear programming (LP) formulation for deep reinforcement learning. The number of the constraints depends on the size of state and action spaces, which makes the problem intractable in large or continuous environments. The general augmented Lagrangian method suffers the double-sampling obstacle in solving the LP. Namely, the conditional expectations originated from the constraint functions and the quadratic penalties in the augmented Lagrangian function impose difficulties in sampling and evaluation. Motivated from the updates of the multipliers, we overcome the obstacles in minimizing the augmented Lagrangian function by replacing the intractable conditional expectations with the multipliers. Therefore, a deep parameterized augment Lagrangian method is proposed. Furthermore, the replacement provides a promising breakthrough to integrate the two steps in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.