Model-based Reinforcement Learning for Semi-Markov Decision Processes   with Neural ODEs

Jianzhun Du; Joseph Futoma; Finale Doshi-Velez

arXiv:2006.16210·cs.LG·October 27, 2020·25 cites

Model-based Reinforcement Learning for Semi-Markov Decision Processes with Neural ODEs

Jianzhun Du, Joseph Futoma, Finale Doshi-Velez

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces neural ODE-based models for semi-Markov decision processes, enabling accurate continuous-time dynamics modeling and efficient policy learning with limited data, along with optimizing interaction schedules.

Contribution

It presents novel neural ODE models for SMDPs and a model-based approach for optimizing time schedules to reduce interactions while maintaining performance.

Findings

01

Models accurately characterize continuous-time dynamics.

02

High-performing policies achieved with limited data.

03

Optimized time schedules reduce environment interactions.

Abstract

We present two elegant solutions for modeling continuous-time dynamics, in a novel model-based reinforcement learning (RL) framework for semi-Markov decision processes (SMDPs), using neural ordinary differential equations (ODEs). Our models accurately characterize continuous-time dynamics and enable us to develop high-performing policies using a small amount of data. We also develop a model-based approach for optimizing time schedules to reduce interaction rates with the environment while maintaining the near-optimal performance, which is not possible for model-free methods. We experimentally demonstrate the efficacy of our methods across various continuous-time domains.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dtak/mbrl-smdp-ode
pytorchOfficial

Videos

Model-based Reinforcement Learning for Semi-Markov Decision Processes with Neural ODEs· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Elevator Systems and Control