Depth and nonlinearity induce implicit exploration for RL

Justas Dauparas; Ryota Tomioka; and Katja Hofmann

arXiv:1805.11711·cs.LG·May 31, 2018

Depth and nonlinearity induce implicit exploration for RL

Justas Dauparas, Ryota Tomioka, and Katja Hofmann

PDF

Open Access

TL;DR

This paper reveals that in reinforcement learning, nonlinear Q-functions with sufficient depth can implicitly facilitate exploration, enabling effective learning without explicit exploration strategies like epsilon-greedy.

Contribution

It demonstrates that depth and nonlinearity in Q-networks can induce implicit exploration, challenging the necessity of explicit exploration methods in RL.

Findings

01

Nonlinear Q-functions can learn benchmark tasks without explicit exploration.

02

Network depth and nonlinearity are crucial for implicit exploration.

03

Implicit exploration can match or outperform explicit methods.

Abstract

The question of how to explore, i.e., take actions with uncertain outcomes to learn about possible future rewards, is a key question in reinforcement learning (RL). Here, we show a surprising result: We show that Q-learning with nonlinear Q-function and no explicit exploration (i.e., a purely greedy policy) can learn several standard benchmark tasks, including mountain car, equally well as, or better than, the most commonly-used $ϵ$ -greedy exploration. We carefully examine this result and show that both the depth of the Q-network and the type of nonlinearity are important to induce such deterministic exploration.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Neural dynamics and brain function · Advanced Bandit Algorithms Research

MethodsQ-Learning