Jointly Learning Environments and Control Policies with Projected Stochastic Gradient Ascent
Adrien Bolland, Ioannis Boukas, Mathias Berger, Damien Ernst

TL;DR
This paper introduces DEPS, a deep reinforcement learning algorithm that jointly optimizes environment design and control policies for stochastic systems, outperforming existing methods in various control tasks.
Contribution
The paper presents a novel algorithm combining policy gradient and model-based optimization for joint environment and policy design in stochastic systems.
Findings
DEPS achieves higher returns with fewer iterations compared to benchmarks.
Joint optimization of environment and policy yields better performance than separate optimization.
DEPS performs effectively across diverse control environments.
Abstract
We consider the joint design and control of discrete-time stochastic dynamical systems over a finite time horizon. We formulate the problem as a multi-step optimization problem under uncertainty seeking to identify a system design and a control policy that jointly maximize the expected sum of rewards collected over the time horizon considered. The transition function, the reward function and the policy are all parametrized, assumed known and differentiable with respect to their parameters. We then introduce a deep reinforcement learning algorithm combining policy gradient methods with model-based optimization techniques to solve this problem. In essence, our algorithm iteratively approximates the gradient of the expected return via Monte-Carlo sampling and automatic differentiation and takes projected gradient ascent steps in the space of environment and policy parameters. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEnergy, Environment, and Transportation Policies · Reinforcement Learning in Robotics · Smart Grid Energy Management
MethodsREINFORCE
