Control randomisation approach for policy gradient and application to reinforcement learning in optimal switching
Robert Denkert, Huy\^en Pham, Xavier Warin

TL;DR
This paper introduces a unified policy gradient framework for continuous-time reinforcement learning, leveraging control randomisation to handle diverse Markovian control problems including switching and stopping, with practical energy sector applications.
Contribution
It develops a novel policy gradient representation using control randomisation and change of measure, extending reinforcement learning methods to a broad class of continuous-time stochastic control problems.
Findings
Effective actor-critic algorithms for Markovian control problems
Application to energy sector real options demonstrates practical utility
New policy gradient formula for randomized control problems
Abstract
We propose a comprehensive framework for policy gradient methods tailored to continuous time reinforcement learning. This is based on the connection between stochastic control problems and randomised problems, enabling applications across various classes of Markovian continuous time control problems, beyond diffusion models, including e.g. regular, impulse and optimal stopping/switching problems. By utilizing change of measure in the control randomisation technique, we derive a new policy gradient representation for these randomised problems, featuring parametrised intensity policies. We further develop actor-critic algorithms specifically designed to address general Markovian stochastic control issues. Our framework is demonstrated through its application to optimal switching problems, with two numerical case studies in the energy sector focusing on real options.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Viral Infectious Diseases and Gene Expression in Insects · Electric Vehicles and Infrastructure
MethodsDiffusion
