A constrained optimization perspective on actor critic algorithms and   application to network routing

Prashanth L.A.; H.L. Prasad; Shalabh Bhatnagar; Prakash Chandra

arXiv:1507.07984·cs.LG·July 30, 2015

A constrained optimization perspective on actor critic algorithms and application to network routing

Prashanth L.A., H.L. Prasad, Shalabh Bhatnagar, Prakash Chandra

PDF

Open Access

TL;DR

This paper introduces a new actor-critic algorithm with guaranteed convergence for Markov decision processes, extending it to function approximation and demonstrating its effectiveness in network routing applications.

Contribution

The paper presents a novel actor-critic method based on constrained optimization principles, ensuring convergence and applicability to real-world network routing problems.

Findings

01

Guaranteed convergence to optimal policy

02

Effective extension with function approximation

03

Successful application to network routing

Abstract

We propose a novel actor-critic algorithm with guaranteed convergence to an optimal policy for a discounted reward Markov decision process. The actor incorporates a descent direction that is motivated by the solution of a certain non-linear optimization problem. We also discuss an extension to incorporate function approximation and demonstrate the practicality of our algorithms on a network routing application.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Advanced Control Systems Optimization