A model for system uncertainty in reinforcement learning

Ryan Murray; Michele Palladino

arXiv:1802.07668·math.OC·February 22, 2018·Syst. Control. Lett.

A model for system uncertainty in reinforcement learning

Ryan Murray, Michele Palladino

PDF

TL;DR

This paper introduces a rigorous framework for modeling system uncertainty in reinforcement learning, focusing on continuous control in uncertain environments and deriving conditions for optimal trajectories.

Contribution

It develops a novel measure-based model for uncertainty that evolves over time, bridging Bayesian RL and adaptive control, with new dynamic programming principles.

Findings

01

Derived Hamilton-Jacobi equations for the model

02

Established necessary conditions for optimal trajectories

03

Provided a framework for exploration-exploitation tradeoff

Abstract

This work provides a rigorous framework for studying continuous time control problems in uncertain environments. The framework considered models uncertainty in state dynamics as a measure on the space of functions. This measure is considered to change over time as agents learn their environment. This model can be seem as a variant of either Bayesian reinforcement learning or adaptive control. We study necessary conditions for locally optimal trajectories within this model, in particular deriving an appropriate dynamic programming principle and Hamilton-Jacobi equations. This model provides one possible framework for studying the tradeoff between exploration and exploitation in reinforcement learning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.