Reward Shaping Using Convolutional Neural Network
Hani Sami, Hadi Otrok, Jamal Bentahar, Azzam Mourad, Ernesto Damiani

TL;DR
This paper introduces VIN-RS, a CNN-based reward shaping method that estimates environment transition matrices and improves learning efficiency in various environments.
Contribution
It presents a novel potential-based reward shaping mechanism using CNNs trained on environment data, automatically inferring transition matrices for enhanced learning.
Findings
Improves learning speed in tabular, Atari, and MuJoCo environments.
Achieves higher maximum cumulative rewards than existing methods.
Effectively estimates transition matrices from environment observations.
Abstract
In this paper, we propose Value Iteration Network for Reward Shaping (VIN-RS), a potential-based reward shaping mechanism using Convolutional Neural Network (CNN). The proposed VIN-RS embeds a CNN trained on computed labels using the message passing mechanism of the Hidden Markov Model. The CNN processes images or graphs of the environment to predict the shaping values. Recent work on reward shaping still has limitations towards training on a representation of the Markov Decision Process (MDP) and building an estimate of the transition matrix. The advantage of VIN-RS is to construct an effective potential function from an estimated MDP while automatically inferring the environment transition matrix. The proposed VIN-RS estimates the transition matrix through a self-learned convolution filter while extracting environment details from the input frames or sampled graphs. Due to (1) the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Adversarial Robustness in Machine Learning · Action Observation and Synchronization
MethodsConvolution · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
