SEREN: Knowing When to Explore and When to Exploit

Changmin Yu; David Mguni; Dong Li; Aivar Sootla; Jun Wang; Neil; Burgess

arXiv:2205.15064·cs.LG·July 1, 2022

SEREN: Knowing When to Explore and When to Exploit

Changmin Yu, David Mguni, Dong Li, Aivar Sootla, Jun Wang, Neil, Burgess

PDF

Open Access

TL;DR

SEREN introduces a novel game-theoretic approach to balance exploration and exploitation in reinforcement learning by dynamically switching between policies, leading to faster convergence and improved performance across benchmarks.

Contribution

The paper proposes SEREN, a new method that models exploration-exploitation as a game between two RL agents, enabling systematic and adaptive exploration strategies.

Findings

01

SEREN converges quickly and naturally shifts towards exploitation.

02

Combining SEREN with existing RL algorithms improves performance.

03

Effective in both discrete and continuous control benchmarks.

Abstract

Efficient reinforcement learning (RL) involves a trade-off between "exploitative" actions that maximise expected reward and "explorative'" ones that sample unvisited states. To encourage exploration, recent approaches proposed adding stochasticity to actions, separating exploration and exploitation phases, or equating reduction in uncertainty with reward. However, these techniques do not necessarily offer entirely systematic approaches making this trade-off. Here we introduce SElective Reinforcement Exploration Network (SEREN) that poses the exploration-exploitation trade-off as a game between an RL agent -- \exploiter, which purely exploits known rewards, and another RL agent -- \switcher, which chooses at which states to activate a pure exploration policy that is trained to minimise system uncertainty and override Exploiter. Using a form of policies known as impulse control, \switcher…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics