On Reward-Balancing Methods for Reinforcement Learning
Simone Baroncini, Bahman Gharesifard, Giuseppe Notarstefano

TL;DR
This paper explores reward-balancing methods in reinforcement learning, providing a theoretical and control-theoretic analysis, extending to stochastic models, and demonstrating improved performance via simulations.
Contribution
It introduces a novel optimal control framework for reward-balancing in RL, including stochastic extensions and performance validation through simulations.
Findings
Theoretical analysis of reward transformations and their algebraic structure.
Extension of reward-balancing to stochastic model sampling with probabilistic guarantees.
Simulation results show performance improvements over existing methods.
Abstract
This paper investigates the so-called reward-balancing methods, a novel class of algorithms for solving discounted-return reinforcement learning (RL) problems. These methods consist of iteratively adjusting the reward function to transform the RL problem into an equivalent one in which the optimal policies are greedy. For this procedure, referred to as normalization process, we provide a theoretical analysis of the involved transformations, emphasizing their algebraic structure. Then, we introduce a control-theoretic reformulation, recasting the reward-balancing procedure into an optimal control framework. The approach is further extended to address model uncertainty through stochastic model sampling, yielding normalization guarantees and probabilistic bounds on stochastic fluctuations. Using the proposed optimal control framework within a scenario model predictive control (MPC)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
