A Self-Supervised Auxiliary Loss for Deep RL in Partially Observable Settings
Eltayeb Ahmed, Luisa Zintgraf, Christian A. Schroeder de Witt and, Nicolas Usunier

TL;DR
This paper introduces an auxiliary loss for deep reinforcement learning in partially observable environments, encouraging agents to develop spatial reasoning by predicting the order of state pairs, leading to improved navigation performance.
Contribution
The paper proposes a novel auxiliary loss based on state pair ordering prediction to enhance spatial reasoning in deep RL agents in partially observable settings.
Findings
Achieved 9.6% increase in episode reward in gridworld navigation.
Demonstrated the auxiliary loss improves spatial reasoning capabilities.
Validated the approach on a navigation task with positive results.
Abstract
In this work we explore an auxiliary loss useful for reinforcement learning in environments where strong performing agents are required to be able to navigate a spatial environment. The auxiliary loss proposed is to minimize the classification error of a neural network classifier that predicts whether or not a pair of states sampled from the agents current episode trajectory are in order. The classifier takes as input a pair of states as well as the agent's memory. The motivation for this auxiliary loss is that there is a strong correlation with which of a pair of states is more recent in the agents episode trajectory and which of the two states is spatially closer to the agent. Our hypothesis is that learning features to answer this question encourages the agent to learn and internalize in memory representations of states that facilitate spatial reasoning. We tested this auxiliary loss…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
