Control randomisation approach for policy gradient and application to   reinforcement learning in optimal switching

Robert Denkert; Huy\^en Pham; Xavier Warin

arXiv:2404.17939·math.OC·May 1, 2024·1 cites

Control randomisation approach for policy gradient and application to reinforcement learning in optimal switching

Robert Denkert, Huy\^en Pham, Xavier Warin

PDF

Open Access

TL;DR

This paper introduces a unified policy gradient framework for continuous-time reinforcement learning, leveraging control randomisation to handle diverse Markovian control problems including switching and stopping, with practical energy sector applications.

Contribution

It develops a novel policy gradient representation using control randomisation and change of measure, extending reinforcement learning methods to a broad class of continuous-time stochastic control problems.

Findings

01

Effective actor-critic algorithms for Markovian control problems

02

Application to energy sector real options demonstrates practical utility

03

New policy gradient formula for randomized control problems

Abstract

We propose a comprehensive framework for policy gradient methods tailored to continuous time reinforcement learning. This is based on the connection between stochastic control problems and randomised problems, enabling applications across various classes of Markovian continuous time control problems, beyond diffusion models, including e.g. regular, impulse and optimal stopping/switching problems. By utilizing change of measure in the control randomisation technique, we derive a new policy gradient representation for these randomised problems, featuring parametrised intensity policies. We further develop actor-critic algorithms specifically designed to address general Markovian stochastic control issues. Our framework is demonstrated through its application to optimal switching problems, with two numerical case studies in the energy sector focusing on real options.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Viral Infectious Diseases and Gene Expression in Insects · Electric Vehicles and Infrastructure

MethodsDiffusion