# Terminal Prediction as an Auxiliary Task for Deep Reinforcement Learning

**Authors:** Bilal Kartal, Pablo Hernandez-Leal, Matthew E. Taylor

arXiv: 1907.10827 · 2019-07-26

## TL;DR

This paper introduces Terminal Prediction as a self-supervised auxiliary task in deep reinforcement learning to improve policy convergence and sample efficiency, demonstrating its effectiveness across various domains.

## Contribution

It proposes a novel auxiliary task called Terminal Prediction for deep RL, enhancing representation learning and policy performance, especially when integrated with A3C.

## Key findings

- A3C-TP outperforms standard A3C on Atari and BipedalWalker.
- A3C-TP improves learning efficiency in Pommerman.
- Terminal Prediction helps agents converge to better policies.

## Abstract

Deep reinforcement learning has achieved great successes in recent years, but there are still open challenges, such as convergence to locally optimal policies and sample inefficiency. In this paper, we contribute a novel self-supervised auxiliary task, i.e., Terminal Prediction (TP), estimating temporal closeness to terminal states for episodic tasks. The intuition is to help representation learning by letting the agent predict how close it is to a terminal state, while learning its control policy. Although TP could be integrated with multiple algorithms, this paper focuses on Asynchronous Advantage Actor-Critic (A3C) and demonstrating the advantages of A3C-TP. Our extensive evaluation includes: a set of Atari games, the BipedalWalker domain, and a mini version of the recently proposed multi-agent Pommerman game. Our results on Atari games and the BipedalWalker domain suggest that A3C-TP outperforms standard A3C in most of the tested domains and in others it has similar performance. In Pommerman, our proposed method provides significant improvement both in learning efficiency and converging to better policies against different opponents.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.10827/full.md

## Figures

14 figures with captions in the complete paper: https://tomesphere.com/paper/1907.10827/full.md

## References

33 references — full list in the complete paper: https://tomesphere.com/paper/1907.10827/full.md

---
Source: https://tomesphere.com/paper/1907.10827