SEREN: Knowing When to Explore and When to Exploit
Changmin Yu, David Mguni, Dong Li, Aivar Sootla, Jun Wang, Neil, Burgess

TL;DR
SEREN introduces a novel game-theoretic approach to balance exploration and exploitation in reinforcement learning by dynamically switching between policies, leading to faster convergence and improved performance across benchmarks.
Contribution
The paper proposes SEREN, a new method that models exploration-exploitation as a game between two RL agents, enabling systematic and adaptive exploration strategies.
Findings
SEREN converges quickly and naturally shifts towards exploitation.
Combining SEREN with existing RL algorithms improves performance.
Effective in both discrete and continuous control benchmarks.
Abstract
Efficient reinforcement learning (RL) involves a trade-off between "exploitative" actions that maximise expected reward and "explorative'" ones that sample unvisited states. To encourage exploration, recent approaches proposed adding stochasticity to actions, separating exploration and exploitation phases, or equating reduction in uncertainty with reward. However, these techniques do not necessarily offer entirely systematic approaches making this trade-off. Here we introduce SElective Reinforcement Exploration Network (SEREN) that poses the exploration-exploitation trade-off as a game between an RL agent -- \exploiter, which purely exploits known rewards, and another RL agent -- \switcher, which chooses at which states to activate a pure exploration policy that is trained to minimise system uncertainty and override Exploiter. Using a form of policies known as impulse control, \switcher…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
