Delayed Rewards Calibration via Reward Empirical Sufficiency

Yixuan Liu; Hu Wang; Xiaowei Wang; Xiaoyue Sun; Liuyue Jiang and; Minhui Xue

arXiv:2102.10527·cs.LG·August 26, 2021

Delayed Rewards Calibration via Reward Empirical Sufficiency

Yixuan Liu, Hu Wang, Xiaowei Wang, Xiaoyue Sun, Liuyue Jiang and, Minhui Xue

PDF

Open Access

TL;DR

This paper introduces a novel delay reward calibration method in reinforcement learning using a classifier to identify sufficient state representations, improving reward accuracy and training efficiency.

Contribution

It proposes a new calibration paradigm based on empirical sufficiency and a classifier to extract states, enhancing reward timing and relevance in RL.

Findings

01

Classifier accurately generates calibrated rewards

02

Improved training efficiency in RL models

03

Sufficient states align with human observations

Abstract

Appropriate credit assignment for delay rewards is a fundamental challenge for reinforcement learning. To tackle this problem, we introduce a delay reward calibration paradigm inspired from a classification perspective. We hypothesize that well-represented state vectors share similarities with each other since they contain the same or equivalent essential information. To this end, we define an empirical sufficient distribution, where the state vectors within the distribution will lead agents to environmental reward signals in the consequent steps. Therefore, a purify-trained classifier is designed to obtain the distribution and generate the calibrated rewards. We examine the correctness of sufficient state extraction by tracking the real-time extraction and building different reward functions in environments. The results demonstrate that the classifier could generate timely and accurate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Neural dynamics and brain function · Neural and Behavioral Psychology Studies