FlowPG: Action-constrained Policy Gradient with Normalizing Flows

Janaka Chathuranga Brahmanage; Jiajing Ling; Akshat Kumar

arXiv:2402.05149·cs.LG·February 9, 2024·1 cites

FlowPG: Action-constrained Policy Gradient with Normalizing Flows

Janaka Chathuranga Brahmanage, Jiajing Ling, Akshat Kumar

PDF

Open Access 1 Repo 1 Video

TL;DR

FlowPG introduces a normalizing flow-based policy gradient method that efficiently enforces action constraints in reinforcement learning, reducing violations and training time without complex optimization steps.

Contribution

This paper proposes using normalizing flows to directly model feasible actions, eliminating the need for projection layers and improving training efficiency in constrained RL.

Findings

01

Significantly fewer constraint violations achieved

02

Up to an order-of-magnitude reduction in violations

03

Multiple times faster training on continuous control tasks

Abstract

Action-constrained reinforcement learning (ACRL) is a popular approach for solving safety-critical and resource-allocation related decision making problems. A major challenge in ACRL is to ensure agent taking a valid action satisfying constraints in each RL step. Commonly used approach of using a projection layer on top of the policy network requires solving an optimization program which can result in longer training time, slow convergence, and zero gradient problem. To address this, first we use a normalizing flow model to learn an invertible, differentiable mapping between the feasible action space and the support of a simple distribution on a latent variable, such as Gaussian. Second, learning the flow model requires sampling from the feasible action space, which is also challenging. We develop multiple methods, based on Hamiltonian Monte-Carlo and probabilistic sentential decision…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rlr-smu/flow-pg
pytorchOfficial

Videos

FlowPG: Action-constrained Policy Gradient with Normalizing Flows· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Advanced Bandit Algorithms Research

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Dense Connections · Batch Normalization · Convolution · Weight Decay · Experience Replay · Adam · Deep Deterministic Policy Gradient