Shaping Proto-Value Functions via Rewards

Chandrashekar Lakshmi Narayanan; Raj Kumar Maity; Shalabh Bhatnagar

arXiv:1511.08589·cs.AI·November 30, 2015

Shaping Proto-Value Functions via Rewards

Chandrashekar Lakshmi Narayanan, Raj Kumar Maity, Shalabh Bhatnagar

PDF

Open Access

TL;DR

This paper introduces reward dependent proto-value functions (RPVFs) that integrate reward shaping with proto-value functions, improving learning especially in symmetrical state spaces with asymmetrical rewards.

Contribution

It presents a novel method combining reward shaping and PVFs to create RPVFs that better capture reward asymmetries during learning.

Findings

01

RPVFs outperform PVFs and reward shaping alone in experiments.

02

RPVFs better capture asymmetries in reward distribution.

03

Learning efficiency improves with RPVFs in symmetrical state spaces.

Abstract

In this paper, we combine task-dependent reward shaping and task-independent proto-value functions to obtain reward dependent proto-value functions (RPVFs). In constructing the RPVFs we are making use of the immediate rewards which are available during the sampling phase but are not used in the PVF construction. We show via experiments that learning with an RPVF based representation is better than learning with just reward shaping or PVFs. In particular, when the state space is symmetrical and the rewards are asymmetrical, the RPVF capture the asymmetry better than the PVFs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Neural Networks and Applications