Reward Shaping Using Convolutional Neural Network

Hani Sami; Hadi Otrok; Jamal Bentahar; Azzam Mourad; Ernesto Damiani

arXiv:2210.16956·cs.AI·November 1, 2022·1 cites

Reward Shaping Using Convolutional Neural Network

Hani Sami, Hadi Otrok, Jamal Bentahar, Azzam Mourad, Ernesto Damiani

PDF

Open Access

TL;DR

This paper introduces VIN-RS, a CNN-based reward shaping method that estimates environment transition matrices and improves learning efficiency in various environments.

Contribution

It presents a novel potential-based reward shaping mechanism using CNNs trained on environment data, automatically inferring transition matrices for enhanced learning.

Findings

01

Improves learning speed in tabular, Atari, and MuJoCo environments.

02

Achieves higher maximum cumulative rewards than existing methods.

03

Effectively estimates transition matrices from environment observations.

Abstract

In this paper, we propose Value Iteration Network for Reward Shaping (VIN-RS), a potential-based reward shaping mechanism using Convolutional Neural Network (CNN). The proposed VIN-RS embeds a CNN trained on computed labels using the message passing mechanism of the Hidden Markov Model. The CNN processes images or graphs of the environment to predict the shaping values. Recent work on reward shaping still has limitations towards training on a representation of the Markov Decision Process (MDP) and building an estimate of the transition matrix. The advantage of VIN-RS is to construct an effective potential function from an estimated MDP while automatically inferring the environment transition matrix. The proposed VIN-RS estimates the transition matrix through a self-learned convolution filter while extracting environment details from the input frames or sampled graphs. Due to (1) the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Adversarial Robustness in Machine Learning · Action Observation and Synchronization

MethodsConvolution · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings