StepScorer: Accelerating Reinforcement Learning with Step-wise Scoring and Psychological Regret Modeling
Zhe Xu

TL;DR
This paper presents StepScorer, which accelerates reinforcement learning by using step-wise regret signals inspired by human psychology, leading to faster convergence especially in environments with sparse or delayed rewards.
Contribution
Introduces the Psychological Regret Model (PRM), a novel method that transforms sparse rewards into dense, step-wise feedback signals to speed up reinforcement learning.
Findings
PRM achieves 36% faster convergence than PPO in Lunar Lander.
Effective in continuous control and delayed feedback environments.
Bridges behavioral economics and reinforcement learning through regret modeling.
Abstract
Reinforcement learning algorithms often suffer from slow convergence due to sparse reward signals, particularly in complex environments where feedback is delayed or infrequent. This paper introduces the Psychological Regret Model (PRM), a novel approach that accelerates learning by incorporating regret-based feedback signals after each decision step. Rather than waiting for terminal rewards, PRM computes a regret signal based on the difference between the expected value of the optimal action and the value of the action taken in each state. This transforms sparse rewards into dense feedback signals through a step-wise scoring framework, enabling faster convergence. We demonstrate that PRM achieves stable performance approximately 36\% faster than traditional Proximal Policy Optimization (PPO) in benchmark environments such as Lunar Lander. Our results indicate that PRM is particularly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Artificial Intelligence in Games
