Depth and nonlinearity induce implicit exploration for RL
Justas Dauparas, Ryota Tomioka, and Katja Hofmann

TL;DR
This paper reveals that in reinforcement learning, nonlinear Q-functions with sufficient depth can implicitly facilitate exploration, enabling effective learning without explicit exploration strategies like epsilon-greedy.
Contribution
It demonstrates that depth and nonlinearity in Q-networks can induce implicit exploration, challenging the necessity of explicit exploration methods in RL.
Findings
Nonlinear Q-functions can learn benchmark tasks without explicit exploration.
Network depth and nonlinearity are crucial for implicit exploration.
Implicit exploration can match or outperform explicit methods.
Abstract
The question of how to explore, i.e., take actions with uncertain outcomes to learn about possible future rewards, is a key question in reinforcement learning (RL). Here, we show a surprising result: We show that Q-learning with nonlinear Q-function and no explicit exploration (i.e., a purely greedy policy) can learn several standard benchmark tasks, including mountain car, equally well as, or better than, the most commonly-used -greedy exploration. We carefully examine this result and show that both the depth of the Q-network and the type of nonlinearity are important to induce such deterministic exploration.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Neural dynamics and brain function · Advanced Bandit Algorithms Research
MethodsQ-Learning
