How the level sampling process impacts zero-shot generalisation in deep reinforcement learning
Samuel Garcin, James Doran, Shangmin Guo, Christopher G. Lucas and, Stefano V. Albrecht

TL;DR
This paper explores how different level sampling strategies in deep reinforcement learning influence zero-shot generalisation, revealing that adaptive and self-supervised methods can improve generalisation by controlling overfitting and over-generalisation.
Contribution
It introduces SSED, a self-supervised environment design approach that reduces mutual information and improves zero-shot generalisation in RL agents.
Findings
Adaptive sampling based on value loss reduces overfitting.
UED methods can cause over-generalisation and degrade ZSG.
SSED improves ZSG performance significantly.
Abstract
A key limitation preventing the wider adoption of autonomous agents trained via deep reinforcement learning (RL) is their limited ability to generalise to new environments, even when these share similar characteristics with environments encountered during training. In this work, we investigate how a non-uniform sampling strategy of individual environment instances, or levels, affects the zero-shot generalisation (ZSG) ability of RL agents, considering two failure modes: overfitting and over-generalisation. As a first step, we measure the mutual information (MI) between the agent's internal representation and the set of training levels, which we find to be well-correlated to instance overfitting. In contrast to uniform sampling, adaptive sampling strategies prioritising levels based on their value loss are more effective at maintaining lower MI, which provides a novel theoretical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Data Classification · Robot Manipulation and Learning
