Intrinsic Rewards from Self-Organizing Feature Maps for Exploration in   Reinforcement Learning

Marius Lindegaard; Hjalmar Jacob Vinje; Odin Aleksander Severinsen

arXiv:2302.04125·cs.LG·February 9, 2023

Intrinsic Rewards from Self-Organizing Feature Maps for Exploration in Reinforcement Learning

Marius Lindegaard, Hjalmar Jacob Vinje, Odin Aleksander Severinsen

PDF

Open Access 1 Repo

TL;DR

This paper proposes an intrinsic reward mechanism using self-organizing feature maps and adaptive resonance theory to enhance exploration in deep reinforcement learning, achieving human-level performance in a challenging game.

Contribution

It introduces a novel exploration bonus based on ART clustering, improving exploration efficiency over existing methods like ICM and RND.

Findings

01

Achieved human-level performance on the game Ordeal.

02

Outperformed agents augmented with RND in our hyperparameter space.

03

Demonstrated effective online, unsupervised state novelty quantification.

Abstract

We introduce an exploration bonus for deep reinforcement learning methods calculated using self-organising feature maps. Our method uses adaptive resonance theory (ART) providing online, unsupervised clustering to quantify the novelty of a state. This heuristic is used to add an intrinsic reward to the extrinsic reward signal for then to optimize the agent to maximize the sum of these two rewards. We find that this method was able to play the game Ordeal at a human level after a comparable number of training epochs to ICM arXiv:1705.05464. Agents augmented with RND arXiv:1810.12894 were unable to achieve the same level of performance in our space of hyperparameters.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mariuslindegaard/curiosity_baselines
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications