Distributional Offline Continuous-Time Reinforcement Learning with   Neural Physics-Informed PDEs (SciPhy RL for DOCTR-L)

Igor Halperin

arXiv:2104.01040·cs.LG·April 5, 2021

Distributional Offline Continuous-Time Reinforcement Learning with Neural Physics-Informed PDEs (SciPhy RL for DOCTR-L)

Igor Halperin

PDF

TL;DR

This paper introduces SciPhy RL, a neural PDE-based approach for distributional offline continuous-time reinforcement learning, enabling high-dimensional policy learning directly from data without iterative optimization.

Contribution

It develops a neural PDE framework for solving the soft HJB equation in offline RL, allowing direct policy extraction from data with uncertainty quantification.

Findings

01

Effective high-dimensional policy learning from offline data

02

Reduces complex RL to supervised neural PDE solving

03

Provides policy quality and uncertainty estimates

Abstract

This paper addresses distributional offline continuous-time reinforcement learning (DOCTR-L) with stochastic policies for high-dimensional optimal control. A soft distributional version of the classical Hamilton-Jacobi-Bellman (HJB) equation is given by a semilinear partial differential equation (PDE). This `soft HJB equation' can be learned from offline data without assuming that the latter correspond to a previous optimal or near-optimal policy. A data-driven solution of the soft HJB equation uses methods of Neural PDEs and Physics-Informed Neural Networks developed in the field of Scientific Machine Learning (SciML). The suggested approach, dubbed `SciPhy RL', thus reduces DOCTR-L to solving neural PDEs from data. Our algorithm called Deep DOCTR-L converts offline high-dimensional data into an optimal policy in one step by reducing it to supervised learning, instead of relying on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.