Learning to Play Pong using Policy Gradient Learning
Somnuk Phon-Amnuaisuk

TL;DR
This paper explores end-to-end deep reinforcement learning for playing Pong directly from pixel inputs, using policy gradients and various neural network architectures, providing insights into internal learning mechanisms.
Contribution
It demonstrates how deep neural networks can learn to play Pong from raw pixels using policy gradient methods without handcrafted features, and analyzes internal network activations.
Findings
Deep neural networks successfully learned to play Pong from pixel data.
Different architectures like FFNN, CNN, and A3C were tested and compared.
Analysis of hidden node activations provided insights into the learning process.
Abstract
Activities in reinforcement learning (RL) revolve around learning the Markov decision process (MDP) model, in particular, the following parameters: state values, V; state-action values, Q; and policy, pi. These parameters are commonly implemented as an array. Scaling up the problem means scaling up the size of the array and this will quickly lead to a computational bottleneck. To get around this, the RL problem is commonly formulated to learn a specific task using hand-crafted input features to curb the size of the array. In this report, we discuss an alternative end-to-end Deep Reinforcement Learning (DRL) approach where the DRL attempts to learn general task representations which in our context refers to learning to play the Pong game from a sequence of screen snapshots without game-specific hand-crafted features. We apply artificial neural networks (ANN) to approximate a policy of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
MethodsConvolution
