A model for system uncertainty in reinforcement learning
Ryan Murray, Michele Palladino

TL;DR
This paper introduces a rigorous framework for modeling system uncertainty in reinforcement learning, focusing on continuous control in uncertain environments and deriving conditions for optimal trajectories.
Contribution
It develops a novel measure-based model for uncertainty that evolves over time, bridging Bayesian RL and adaptive control, with new dynamic programming principles.
Findings
Derived Hamilton-Jacobi equations for the model
Established necessary conditions for optimal trajectories
Provided a framework for exploration-exploitation tradeoff
Abstract
This work provides a rigorous framework for studying continuous time control problems in uncertain environments. The framework considered models uncertainty in state dynamics as a measure on the space of functions. This measure is considered to change over time as agents learn their environment. This model can be seem as a variant of either Bayesian reinforcement learning or adaptive control. We study necessary conditions for locally optimal trajectories within this model, in particular deriving an appropriate dynamic programming principle and Hamilton-Jacobi equations. This model provides one possible framework for studying the tradeoff between exploration and exploitation in reinforcement learning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
