How the level sampling process impacts zero-shot generalisation in deep   reinforcement learning

Samuel Garcin; James Doran; Shangmin Guo; Christopher G. Lucas and; Stefano V. Albrecht

arXiv:2310.03494·cs.LG·December 12, 2023

How the level sampling process impacts zero-shot generalisation in deep reinforcement learning

Samuel Garcin, James Doran, Shangmin Guo, Christopher G. Lucas and, Stefano V. Albrecht

PDF

Open Access

TL;DR

This paper explores how different level sampling strategies in deep reinforcement learning influence zero-shot generalisation, revealing that adaptive and self-supervised methods can improve generalisation by controlling overfitting and over-generalisation.

Contribution

It introduces SSED, a self-supervised environment design approach that reduces mutual information and improves zero-shot generalisation in RL agents.

Findings

01

Adaptive sampling based on value loss reduces overfitting.

02

UED methods can cause over-generalisation and degrade ZSG.

03

SSED improves ZSG performance significantly.

Abstract

A key limitation preventing the wider adoption of autonomous agents trained via deep reinforcement learning (RL) is their limited ability to generalise to new environments, even when these share similar characteristics with environments encountered during training. In this work, we investigate how a non-uniform sampling strategy of individual environment instances, or levels, affects the zero-shot generalisation (ZSG) ability of RL agents, considering two failure modes: overfitting and over-generalisation. As a first step, we measure the mutual information (MI) between the agent's internal representation and the set of training levels, which we find to be well-correlated to instance overfitting. In contrast to uniform sampling, adaptive sampling strategies prioritising levels based on their value loss are more effective at maintaining lower MI, which provides a novel theoretical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Data Classification · Robot Manipulation and Learning