TL;DR
This paper introduces K-Score, a Kalman filter-based method for reward normalization in reinforcement learning, which improves convergence and stability without modifying existing architectures.
Contribution
It presents a novel, simple Kalman filter-based reward normalization technique that adapts online, reducing variance and accelerating training in policy gradient methods.
Findings
Kalman-filtered rewards outperform standard normalization in LunarLander and CartPole.
The method accelerates convergence and reduces training variance.
Code is publicly available at the provided GitHub URL.
Abstract
We propose a simple yet effective alternative to reward normalization in policy gradient reinforcement learning by integrating a 1D Kalman filter for online reward estimation. Instead of relying on fixed heuristics, our method recursively estimates the latent reward mean, smoothing high-variance returns and adapting to non-stationary environments. This approach incurs minimal overhead and requires no modification to existing policy architectures. Experiments on \textit{LunarLander} and \textit{CartPole} demonstrate that Kalman-filtered rewards significantly accelerate convergence and reduce training variance compared to standard normalization techniques. Code is available at https://github.com/Sumxiaa/Kalman_Normalization.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
