K-Score: Kalman Filter as a Principled Alternative to Reward Normalization in Reinforcement Learning

Zixuan Xia; Quanxi Li

arXiv:2604.23056·cs.LG·April 28, 2026

K-Score: Kalman Filter as a Principled Alternative to Reward Normalization in Reinforcement Learning

Zixuan Xia, Quanxi Li

PDF

1 Repo

TL;DR

This paper introduces K-Score, a Kalman filter-based method for reward normalization in reinforcement learning, which improves convergence and stability without modifying existing architectures.

Contribution

It presents a novel, simple Kalman filter-based reward normalization technique that adapts online, reducing variance and accelerating training in policy gradient methods.

Findings

01

Kalman-filtered rewards outperform standard normalization in LunarLander and CartPole.

02

The method accelerates convergence and reduces training variance.

03

Code is publicly available at the provided GitHub URL.

Abstract

We propose a simple yet effective alternative to reward normalization in policy gradient reinforcement learning by integrating a 1D Kalman filter for online reward estimation. Instead of relying on fixed heuristics, our method recursively estimates the latent reward mean, smoothing high-variance returns and adapting to non-stationary environments. This approach incurs minimal overhead and requires no modification to existing policy architectures. Experiments on \textit{LunarLander} and \textit{CartPole} demonstrate that Kalman-filtered rewards significantly accelerate convergence and reduce training variance compared to standard normalization techniques. Code is available at https://github.com/Sumxiaa/Kalman_Normalization.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Sumxiaa/Kalman_Normalization
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.