First-Explore, then Exploit: Meta-Learning to Solve Hard Exploration-Exploitation Trade-Offs
Ben Norman, Jeff Clune

TL;DR
This paper introduces First-Explore, a meta-reinforcement learning method that learns separate exploration and exploitation policies, enabling better exploration strategies that improve performance in challenging domains with complex exploration-exploitation trade-offs.
Contribution
First-Explore addresses the limitation of existing meta-RL methods by learning distinct policies for exploration and exploitation, allowing for reward-sacrificing exploration strategies.
Findings
Outperforms existing meta-RL methods in exploration tasks
Effectively learns to explore even when it sacrifices early rewards
Enhances meta-RL's ability to handle complex exploration-exploitation trade-offs
Abstract
Standard reinforcement learning (RL) agents never intelligently explore like a human (i.e. taking into account complex domain priors and adapting quickly based on previous exploration). Across episodes, RL agents struggle to perform even simple exploration strategies, for example systematic search that avoids exploring the same location multiple times. This poor exploration limits performance on challenging domains. Meta-RL is a potential solution, as unlike standard RL, meta-RL can learn to explore, and potentially learn highly complex strategies far beyond those of standard RL, strategies such as experimenting in early episodes to learn new skills, or conducting experiments to learn about the current environment. Traditional meta-RL focuses on the problem of learning to optimally balance exploration and exploitation to maximize the cumulative reward of the episode sequence (e.g.,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Data Stream Mining Techniques
