# Novelty Search for Deep Reinforcement Learning Policy Network Weights by   Action Sequence Edit Metric Distance

**Authors:** Ethan C. Jackson, Mark Daley

arXiv: 1902.03142 · 2019-02-11

## TL;DR

This paper introduces a novelty search method based on action sequence edit distances to promote innovation in deep reinforcement learning, demonstrating its effectiveness in Atari games and showing GAs can be competitive with gradient-based methods.

## Contribution

It proposes a novel novelty search approach using action sequence edit metrics and integrates stagnation detection to enhance deep RL policy learning.

## Key findings

- Novelty search over action sequences improves policy diversity.
- Genetic algorithms are competitive with gradient-based methods in deep RL.
- The proposed methods enhance exploration and policy innovation.

## Abstract

Reinforcement learning (RL) problems often feature deceptive local optima, and learning methods that optimize purely for reward signal often fail to learn strategies for overcoming them. Deep neuroevolution and novelty search have been proposed as effective alternatives to gradient-based methods for learning RL policies directly from pixels. In this paper, we introduce and evaluate the use of novelty search over agent action sequences by string edit metric distance as a means for promoting innovation. We also introduce a method for stagnation detection and population resampling inspired by recent developments in the RL community that uses the same mechanisms as novelty search to promote and develop innovative policies. Our methods extend a state-of-the-art method for deep neuroevolution using a simple-yet-effective genetic algorithm (GA) designed to efficiently learn deep RL policy network weights. Experiments using four games from the Atari 2600 benchmark were conducted. Results provide further evidence that GAs are competitive with gradient-based algorithms for deep RL. Results also demonstrate that novelty search over action sequences is an effective source of selection pressure that can be integrated into existing evolutionary algorithms for deep RL.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1902.03142/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1902.03142/full.md

## References

27 references — full list in the complete paper: https://tomesphere.com/paper/1902.03142/full.md

---
Source: https://tomesphere.com/paper/1902.03142