More Efficient Exploration with Symbolic Priors on Action Sequence   Equivalences

Toby Johnstone; Nathan Grinsztajn; Johan Ferret; Philippe Preux

arXiv:2110.10632·cs.LG·November 9, 2021

More Efficient Exploration with Symbolic Priors on Action Sequence Equivalences

Toby Johnstone, Nathan Grinsztajn, Johan Ferret, Philippe Preux

PDF

Open Access

TL;DR

This paper introduces a novel exploration strategy for reinforcement learning that leverages symbolic priors about action sequence equivalences, improving efficiency by reducing redundant exploration.

Contribution

It proposes a convex optimization-based local exploration method that exploits action sequence equivalences, enhancing exploration efficiency in reinforcement learning.

Findings

01

Strategy reduces exploration collisions

02

Improves state visitation efficiency

03

Effective across various dynamic environments

Abstract

Incorporating prior knowledge in reinforcement learning algorithms is mainly an open question. Even when insights about the environment dynamics are available, reinforcement learning is traditionally used in a tabula rasa setting and must explore and learn everything from scratch. In this paper, we consider the problem of exploiting priors about action sequence equivalence: that is, when different sequences of actions produce the same effect. We propose a new local exploration strategy calibrated to minimize collisions and maximize new state visitations. We show that this strategy can be computed at little cost, by solving a convex optimization problem. By replacing the usual epsilon-greedy strategy in a DQN, we demonstrate its potential in several environments with various dynamic structures.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · VLSI and FPGA Design Techniques

MethodsConvolution · Q-Learning · Dense Connections · Deep Q-Network