Deep reinforcement learning for weakly coupled MDP's with continuous   actions

Francisco Robledo (LMAP; UPPA; UPV / EHU); Urtzi Ayesta (IRIT-RMESS,; UPV/EHU; CNRS); Konstantin Avrachenkov (Inria)

arXiv:2406.01099·cs.LG·June 13, 2024

Deep reinforcement learning for weakly coupled MDP's with continuous actions

Francisco Robledo (LMAP, UPPA, UPV / EHU), Urtzi Ayesta (IRIT-RMESS,, UPV/EHU, CNRS), Konstantin Avrachenkov (Inria)

PDF

TL;DR

This paper proposes the Lagrange Policy for Continuous Actions (LPCA), a novel reinforcement learning algorithm tailored for weakly coupled MDPs with continuous actions, effectively handling resource constraints through a neural network approach.

Contribution

The paper introduces LPCA, a new RL algorithm that decouples weakly coupled MDPs with continuous actions using Lagrange relaxation within neural networks, enabling efficient resource-constrained policy learning.

Findings

01

LPCA outperforms existing methods in resource management tasks.

02

LPCA demonstrates robustness and efficiency across various settings.

03

The approach effectively balances reward maximization and resource constraints.

Abstract

This paper introduces the Lagrange Policy for Continuous Actions (LPCA), a reinforcement learning algorithm specifically designed for weakly coupled MDP problems with continuous action spaces. LPCA addresses the challenge of resource constraints dependent on continuous actions by introducing a Lagrange relaxation of the weakly coupled MDP problem within a neural network framework for Q-value computation. This approach effectively decouples the MDP, enabling efficient policy learning in resource-constrained environments. We present two variations of LPCA: LPCA-DE, which utilizes differential evolution for global optimization, and LPCA-Greedy, a method that incrementally and greadily selects actions based on Q-value gradients. Comparative analysis against other state-of-the-art techniques across various settings highlight LPCA's robustness and efficiency in managing resource allocation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.