# Diverse Exploration via Conjugate Policies for Policy Gradient Methods

**Authors:** Andrew Cohen, Xingye Qiao, Lei Yu, Elliot Way, Xiangrong, Tong

arXiv: 1902.03633 · 2019-02-12

## TL;DR

This paper introduces diverse exploration through conjugate policies in policy gradient methods, enhancing exploration efficiency and policy performance by leveraging conjugate gradient descent.

## Contribution

It proposes a novel DE approach using conjugate policies generated via conjugate gradient descent, with theoretical and empirical validation.

## Key findings

- DE improves exploration compared to random perturbations.
- DE enhances policy performance in experiments.
- Theoretical analysis supports DE's effectiveness.

## Abstract

We address the challenge of effective exploration while maintaining good performance in policy gradient methods. As a solution, we propose diverse exploration (DE) via conjugate policies. DE learns and deploys a set of conjugate policies which can be conveniently generated as a byproduct of conjugate gradient descent. We provide both theoretical and empirical results showing the effectiveness of DE at achieving exploration, improving policy performance, and the advantage of DE over exploration by random policy perturbations.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1902.03633/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1902.03633/full.md

## References

27 references — full list in the complete paper: https://tomesphere.com/paper/1902.03633/full.md

---
Source: https://tomesphere.com/paper/1902.03633