Efficient Deep Reinforcement Learning with Predictive Processing Proximal Policy Optimization
Burcu K\"u\c{c}\"uko\u{g}lu, Walraaf Borkent, Bodo Rueckauer, Nasir, Ahmad, Umut G\"u\c{c}l\"u, Marcel van Gerven

TL;DR
This paper introduces P4O, a reinforcement learning agent that integrates predictive processing inspired by neuroscience, leading to significant improvements in sample efficiency and performance on Atari games, surpassing human levels in some cases.
Contribution
The paper presents P4O, a novel RL agent combining predictive processing with PPO, demonstrating enhanced efficiency and performance without hyperparameter tuning.
Findings
P4O outperforms baseline recurrent PPO on Atari games.
P4O surpasses state-of-the-art agents within the same training time.
P4O exceeds human performance on multiple challenging Atari games.
Abstract
Advances in reinforcement learning (RL) often rely on massive compute resources and remain notoriously sample inefficient. In contrast, the human brain is able to efficiently learn effective control strategies using limited resources. This raises the question whether insights from neuroscience can be used to improve current RL methods. Predictive processing is a popular theoretical framework which maintains that the human brain is actively seeking to minimize surprise. We show that recurrent neural networks which predict their own sensory states can be leveraged to minimise surprise, yielding substantial gains in cumulative reward. Specifically, we present the Predictive Processing Proximal Policy Optimization (P4O) agent; an actor-critic reinforcement learning agent that applies predictive processing to a recurrent variant of the PPO algorithm by integrating a world model in its hidden…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · EEG and Brain-Computer Interfaces · Traffic control and management
MethodsEntropy Regularization · Proximal Policy Optimization
