Learning Synthetic Environments and Reward Networks for Reinforcement Learning
Fabio Ferreira, Thomas Nierhoff, Andreas Saelinger, Frank, Hutter

TL;DR
This paper introduces Synthetic Environments and Reward Networks as neural network-based proxy models for training reinforcement learning agents, enabling efficient training with fewer real environment interactions and robust transferability.
Contribution
It proposes a novel bi-level optimization framework to evolve proxy environments and reward models, improving RL training efficiency and transferability.
Findings
SE proxies reduce real environment interactions for training
Agents trained on SEs perform comparably to those trained on real environments
SEs are robust to hyperparameter changes and transfer to unseen agents
Abstract
We introduce Synthetic Environments (SEs) and Reward Networks (RNs), represented by neural networks, as proxy environment models for training Reinforcement Learning (RL) agents. We show that an agent, after being trained exclusively on the SE, is able to solve the corresponding real environment. While an SE acts as a full proxy to a real environment by learning about its state dynamics and rewards, an RN is a partial proxy that learns to augment or replace rewards. We use bi-level optimization to evolve SEs and RNs: the inner loop trains the RL agent, and the outer loop trains the parameters of the SE / RN via an evolution strategy. We evaluate our proposed new concept on a broad range of RL algorithms and classic control environments. In a one-to-one comparison, learning an SE proxy requires more interactions with the real environment than training agents only on the real environment.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Neural Networks and Applications · Data Stream Mining Techniques
