StepScorer: Accelerating Reinforcement Learning with Step-wise Scoring and Psychological Regret Modeling

Zhe Xu

arXiv:2602.03171·cs.LG·February 4, 2026

StepScorer: Accelerating Reinforcement Learning with Step-wise Scoring and Psychological Regret Modeling

Zhe Xu

PDF

Open Access

TL;DR

This paper presents StepScorer, which accelerates reinforcement learning by using step-wise regret signals inspired by human psychology, leading to faster convergence especially in environments with sparse or delayed rewards.

Contribution

Introduces the Psychological Regret Model (PRM), a novel method that transforms sparse rewards into dense, step-wise feedback signals to speed up reinforcement learning.

Findings

01

PRM achieves 36% faster convergence than PPO in Lunar Lander.

02

Effective in continuous control and delayed feedback environments.

03

Bridges behavioral economics and reinforcement learning through regret modeling.

Abstract

Reinforcement learning algorithms often suffer from slow convergence due to sparse reward signals, particularly in complex environments where feedback is delayed or infrequent. This paper introduces the Psychological Regret Model (PRM), a novel approach that accelerates learning by incorporating regret-based feedback signals after each decision step. Rather than waiting for terminal rewards, PRM computes a regret signal based on the difference between the expected value of the optimal action and the value of the action taken in each state. This transforms sparse rewards into dense feedback signals through a step-wise scoring framework, enabling faster convergence. We demonstrate that PRM achieves stable performance approximately 36\% faster than traditional Proximal Policy Optimization (PPO) in benchmark environments such as Lunar Lander. Our results indicate that PRM is particularly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Artificial Intelligence in Games