Regulating Reward Training by Means of Certainty Prediction in a Neural Network-Implemented Pong Game
Matt Oberdorfer, Matt Abuzalaf

TL;DR
This paper introduces a reinforcement learning model for Pong that uses a neural network to predict the certainty of its actions, leading to improved performance over traditional methods.
Contribution
It presents a novel reinforcement learning approach that incorporates a certainty prediction neural network to enhance reward-based training in a Pong game.
Findings
The certainty-based model outperforms the simple architecture in gameplay.
The model surpasses a near-perfect opponent after additional training.
Certainty prediction improves reinforcement learning efficiency.
Abstract
We present the first reinforcement-learning model to self-improve its reward-modulated training implemented through a continuously improving "intuition" neural network. An agent was trained how to play the arcade video game Pong with two reward-based alternatives, one where the paddle was placed randomly during training, and a second where the paddle was simultaneously trained on three additional neural networks such that it could develop a sense of "certainty" as to how probable its own predicted paddle position will be to return the ball. If the agent was less than 95% certain to return the ball, the policy used an intuition neural network to place the paddle. We trained both architectures for an equivalent number of epochs and tested learning performance by letting the trained programs play against a near-perfect opponent. Through this, we found that the reinforcement learning model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Games · Reinforcement Learning in Robotics · Time Series Analysis and Forecasting
