Jointly Learning Environments and Control Policies with Projected   Stochastic Gradient Ascent

Adrien Bolland; Ioannis Boukas; Mathias Berger; Damien Ernst

arXiv:2006.01738·cs.LG·January 7, 2022

Jointly Learning Environments and Control Policies with Projected Stochastic Gradient Ascent

Adrien Bolland, Ioannis Boukas, Mathias Berger, Damien Ernst

PDF

Open Access 1 Repo

TL;DR

This paper introduces DEPS, a deep reinforcement learning algorithm that jointly optimizes environment design and control policies for stochastic systems, outperforming existing methods in various control tasks.

Contribution

The paper presents a novel algorithm combining policy gradient and model-based optimization for joint environment and policy design in stochastic systems.

Findings

01

DEPS achieves higher returns with fewer iterations compared to benchmarks.

02

Joint optimization of environment and policy yields better performance than separate optimization.

03

DEPS performs effectively across diverse control environments.

Abstract

We consider the joint design and control of discrete-time stochastic dynamical systems over a finite time horizon. We formulate the problem as a multi-step optimization problem under uncertainty seeking to identify a system design and a control policy that jointly maximize the expected sum of rewards collected over the time horizon considered. The transition function, the reward function and the policy are all parametrized, assumed known and differentiable with respect to their parameters. We then introduce a deep reinforcement learning algorithm combining policy gradient methods with model-based optimization techniques to solve this problem. In essence, our algorithm iteratively approximates the gradient of the expected return via Monte-Carlo sampling and automatic differentiation and takes projected gradient ascent steps in the space of environment and policy parameters. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

adrienbolland/jointly-learning-environments-and-control-policies-with-projected-stochastic-gradient-ascent
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEnergy, Environment, and Transportation Policies · Reinforcement Learning in Robotics · Smart Grid Energy Management

MethodsREINFORCE