Actively Learning Reinforcement Learning: A Stochastic Optimal Control Approach
Mohammad S. Ramadan, Mahmoud A. Hayajnh, Michael T. Tolley, Kyriakos, G. Vamvoudakis

TL;DR
This paper introduces a reinforcement learning framework that combines active exploration with stochastic optimal control to improve decision-making under uncertainty, while addressing computational challenges and enabling real-time cautious exploration.
Contribution
It presents a novel RL approach that integrates stochastic optimal control to facilitate active learning, automatic exploration, and computational efficiency in uncertain environments.
Findings
The proposed method stabilizes control performance under uncertainty.
It outperforms certainty equivalence-based LQ regulators in simulations.
The approach enables real-time cautious exploration and exploitation.
Abstract
In this paper we propose a framework towards achieving two intertwined objectives: (i) equipping reinforcement learning with active exploration and deliberate information gathering, such that it regulates state and parameter uncertainties resulting from modeling mismatches and noisy sensory; and (ii) overcoming the computational intractability of stochastic optimal control. We approach both objectives by using reinforcement learning to compute the stochastic optimal control law. On one hand, we avoid the curse of dimensionality prohibiting the direct solution of the stochastic dynamic programming equation. On the other hand, the resulting stochastic optimal control reinforcement learning agent admits caution and probing, that is, optimal online exploration and exploitation. Unlike fixed exploration and exploitation balance, caution and probing are employed automatically by the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Supply Chain and Inventory Management
