Maximum Entropy Model-based Reinforcement Learning

Oleg Svidchenko; Aleksei Shpilman

arXiv:2112.01195·cs.AI·December 3, 2021

Maximum Entropy Model-based Reinforcement Learning

Oleg Svidchenko, Aleksei Shpilman

PDF

Open Access

TL;DR

This paper introduces a novel exploration method tailored for model-based reinforcement learning, significantly enhancing the sample efficiency and performance of the Dreamer algorithm in complex control tasks.

Contribution

The paper presents a new exploration technique specifically designed for model-based RL, addressing the gap in sample efficiency improvements for such algorithms.

Findings

01

The proposed method improves Dreamer's performance in high-dimensional tasks.

02

Experimental results show increased sample efficiency and better exploration.

03

The approach outperforms existing exploration strategies in model-based RL.

Abstract

Recent advances in reinforcement learning have demonstrated its ability to solve hard agent-environment interaction tasks on a super-human level. However, the application of reinforcement learning methods to practical and real-world tasks is currently limited due to most RL state-of-art algorithms' sample inefficiency, i.e., the need for a vast number of training episodes. For example, OpenAI Five algorithm that has beaten human players in Dota 2 has trained for thousands of years of game time. Several approaches exist that tackle the issue of sample inefficiency, that either offers a more efficient usage of already gathered experience or aim to gain a more relevant and diverse experience via a better exploration of an environment. However, to our knowledge, no such approach exists for model-based algorithms, that showed their high sample efficiency in solving hard control tasks with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Advanced Bandit Algorithms Research