DGPO: Discovering Multiple Strategies with Diversity-Guided Policy   Optimization

Wentse Chen; Shiyu Huang; Yuan Chiang; Tim Pearce; Wei-Wei Tu; Ting; Chen; Jun Zhu

arXiv:2207.05631·cs.LG·January 9, 2024·1 cites

DGPO: Discovering Multiple Strategies with Diversity-Guided Policy Optimization

Wentse Chen, Shiyu Huang, Yuan Chiang, Tim Pearce, Wei-Wei Tu, Ting, Chen, Jun Zhu

PDF

Open Access 2 Repos 1 Video

TL;DR

DGPO is an on-policy reinforcement learning algorithm that efficiently discovers multiple diverse strategies for a task using a shared policy network, balancing diversity and reward through an information-theoretic intrinsic reward.

Contribution

It introduces a novel diversity-guided optimization method that finds multiple strategies with a single shared policy, unlike prior approaches that require separate policies.

Findings

01

DGPO discovers more diverse strategies than baselines.

02

DGPO achieves comparable rewards with better sample efficiency.

03

The method effectively balances diversity and task performance.

Abstract

Most reinforcement learning algorithms seek a single optimal strategy that solves a given task. However, it can often be valuable to learn a diverse set of solutions, for instance, to make an agent's interaction with users more engaging, or improve the robustness of a policy to an unexpected perturbance. We propose Diversity-Guided Policy Optimization (DGPO), an on-policy algorithm that discovers multiple strategies for solving a given task. Unlike prior work, it achieves this with a shared policy network trained over a single run. Specifically, we design an intrinsic reward based on an information-theoretic diversity objective. Our final objective alternately constraints on the diversity of the strategies and on the extrinsic reward. We solve the constrained optimization problem by casting it as a probabilistic inference task and use policy iteration to maximize the derived lower…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

DGPO: Discovering Multiple Strategies with Diversity-Guided Policy Optimization· underline

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Data Classification · Human Pose and Action Recognition