Efficient Deep Reinforcement Learning with Predictive Processing   Proximal Policy Optimization

Burcu K\"u\c{c}\"uko\u{g}lu; Walraaf Borkent; Bodo Rueckauer; Nasir; Ahmad; Umut G\"u\c{c}l\"u; Marcel van Gerven

arXiv:2211.06236·cs.LG·January 30, 2024

Efficient Deep Reinforcement Learning with Predictive Processing Proximal Policy Optimization

Burcu K\"u\c{c}\"uko\u{g}lu, Walraaf Borkent, Bodo Rueckauer, Nasir, Ahmad, Umut G\"u\c{c}l\"u, Marcel van Gerven

PDF

Open Access 1 Repo

TL;DR

This paper introduces P4O, a reinforcement learning agent that integrates predictive processing inspired by neuroscience, leading to significant improvements in sample efficiency and performance on Atari games, surpassing human levels in some cases.

Contribution

The paper presents P4O, a novel RL agent combining predictive processing with PPO, demonstrating enhanced efficiency and performance without hyperparameter tuning.

Findings

01

P4O outperforms baseline recurrent PPO on Atari games.

02

P4O surpasses state-of-the-art agents within the same training time.

03

P4O exceeds human performance on multiple challenging Atari games.

Abstract

Advances in reinforcement learning (RL) often rely on massive compute resources and remain notoriously sample inefficient. In contrast, the human brain is able to efficiently learn effective control strategies using limited resources. This raises the question whether insights from neuroscience can be used to improve current RL methods. Predictive processing is a popular theoretical framework which maintains that the human brain is actively seeking to minimize surprise. We show that recurrent neural networks which predict their own sensory states can be leveraged to minimise surprise, yielding substantial gains in cumulative reward. Specifically, we present the Predictive Processing Proximal Policy Optimization (P4O) agent; an actor-critic reinforcement learning agent that applies predictive processing to a recurrent variant of the PPO algorithm by integrating a world model in its hidden…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

burcukoglu/p4o-predictiveprocessingppo
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · EEG and Brain-Computer Interfaces · Traffic control and management

MethodsEntropy Regularization · Proximal Policy Optimization