More Efficient Exploration with Symbolic Priors on Action Sequence Equivalences
Toby Johnstone, Nathan Grinsztajn, Johan Ferret, Philippe Preux

TL;DR
This paper introduces a novel exploration strategy for reinforcement learning that leverages symbolic priors about action sequence equivalences, improving efficiency by reducing redundant exploration.
Contribution
It proposes a convex optimization-based local exploration method that exploits action sequence equivalences, enhancing exploration efficiency in reinforcement learning.
Findings
Strategy reduces exploration collisions
Improves state visitation efficiency
Effective across various dynamic environments
Abstract
Incorporating prior knowledge in reinforcement learning algorithms is mainly an open question. Even when insights about the environment dynamics are available, reinforcement learning is traditionally used in a tabula rasa setting and must explore and learn everything from scratch. In this paper, we consider the problem of exploiting priors about action sequence equivalence: that is, when different sequences of actions produce the same effect. We propose a new local exploration strategy calibrated to minimize collisions and maximize new state visitations. We show that this strategy can be computed at little cost, by solving a convex optimization problem. By replacing the usual epsilon-greedy strategy in a DQN, we demonstrate its potential in several environments with various dynamic structures.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · VLSI and FPGA Design Techniques
MethodsConvolution · Q-Learning · Dense Connections · Deep Q-Network
