A Multilevel Reinforcement Learning Framework for PDE-based Control
Atish Dixit, Ahmed Elsheikh

TL;DR
This paper introduces a multilevel reinforcement learning framework that reduces computational costs in PDE-based control problems by leveraging coarser models and multilevel Monte Carlo estimates, demonstrated with a multilevel PPO algorithm.
Contribution
It proposes a novel multilevel RL framework that exploits sublevel models to improve efficiency in PDE-based control, specifically through a multilevel PPO algorithm.
Findings
Significant computational savings with multilevel PPO.
Effective use of coarser grid models for RL.
Demonstrated on stochastic PDE simulation environments.
Abstract
Reinforcement learning (RL) is a promising method to solve control problems. However, model-free RL algorithms are sample inefficient and require thousands if not millions of samples to learn optimal control policies. A major source of computational cost in RL corresponds to the transition function, which is dictated by the model dynamics. This is especially problematic when model dynamics is represented with coupled PDEs. In such cases, the transition function often involves solving a large-scale discretization of the said PDEs. We propose a multilevel RL framework in order to ease this cost by exploiting sublevel models that correspond to coarser scale discretization (i.e. multilevel models). This is done by formulating an approximate multilevel Monte Carlo estimate of the objective function of the policy and / or value network instead of Monte Carlo estimates, as done in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Multi-Objective Optimization Algorithms · Reinforcement Learning in Robotics · Energy Efficiency and Management
MethodsEntropy Regularization · Proximal Policy Optimization
