Delayed Rewards Calibration via Reward Empirical Sufficiency
Yixuan Liu, Hu Wang, Xiaowei Wang, Xiaoyue Sun, Liuyue Jiang and, Minhui Xue

TL;DR
This paper introduces a novel delay reward calibration method in reinforcement learning using a classifier to identify sufficient state representations, improving reward accuracy and training efficiency.
Contribution
It proposes a new calibration paradigm based on empirical sufficiency and a classifier to extract states, enhancing reward timing and relevance in RL.
Findings
Classifier accurately generates calibrated rewards
Improved training efficiency in RL models
Sufficient states align with human observations
Abstract
Appropriate credit assignment for delay rewards is a fundamental challenge for reinforcement learning. To tackle this problem, we introduce a delay reward calibration paradigm inspired from a classification perspective. We hypothesize that well-represented state vectors share similarities with each other since they contain the same or equivalent essential information. To this end, we define an empirical sufficient distribution, where the state vectors within the distribution will lead agents to environmental reward signals in the consequent steps. Therefore, a purify-trained classifier is designed to obtain the distribution and generate the calibrated rewards. We examine the correctness of sufficient state extraction by tracking the real-time extraction and building different reward functions in environments. The results demonstrate that the classifier could generate timely and accurate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Neural dynamics and brain function · Neural and Behavioral Psychology Studies
