Cyclophobic Reinforcement Learning

Stefan Sylvius Wagner; Peter Arndt; Jan Robine; Stefan Harmeling

arXiv:2308.15911·cs.LG·August 31, 2023

Cyclophobic Reinforcement Learning

Stefan Sylvius Wagner, Peter Arndt, Jan Robine, Stefan Harmeling

PDF

Open Access

TL;DR

This paper introduces a cyclophobic intrinsic reward for reinforcement learning that discourages redundant cycles, leading to more efficient exploration in complex, sparse-reward environments like MiniGrid and MiniHack.

Contribution

The paper proposes a novel cyclophobic intrinsic reward that avoids cycles, enhancing exploration efficiency in sparse-reward environments, and combines it with hierarchical representations for improved performance.

Findings

01

Outperforms previous methods in MiniGrid and MiniHack environments

02

Achieves higher sample efficiency in complex exploration tasks

03

Demonstrates the effectiveness of cycle avoidance in reinforcement learning

Abstract

In environments with sparse rewards, finding a good inductive bias for exploration is crucial to the agent's success. However, there are two competing goals: novelty search and systematic exploration. While existing approaches such as curiosity-driven exploration find novelty, they sometimes do not systematically explore the whole state space, akin to depth-first-search vs breadth-first-search. In this paper, we propose a new intrinsic reward that is cyclophobic, i.e., it does not reward novelty, but punishes redundancy by avoiding cycles. Augmenting the cyclophobic intrinsic reward with a sequence of hierarchical representations based on the agent's cropped observations we are able to achieve excellent results in the MiniGrid and MiniHack environments. Both are particularly hard, as they require complex interactions with different objects in order to be solved. Detailed comparisons…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Visual Attention and Saliency Detection