AdaGamma: State-Dependent Discounting for Temporal Adaptation in Reinforcement Learning
Yaomin Wang, Jianting Pan, Ran Tian, Xiaoyang Li, Yu Zhang, Hengle Qin, Tianshu YU

TL;DR
AdaGamma introduces a practical method for learning state-dependent discount factors in deep reinforcement learning, improving performance and stability across benchmarks and real-world tests.
Contribution
It proposes a novel AdaGamma algorithm that jointly learns state-dependent discounts with a regularization objective, addressing stability issues in deep RL.
Findings
AdaGamma improves performance on continuous-control benchmarks.
It achieves significant gains in an online logistics platform test.
The method maintains stable learning with state-dependent discounting.
Abstract
The discount factor in reinforcement learning controls both the effective planning horizon and the strength of bootstrapping, yet most deep RL methods use a single fixed value across all states. While state-dependent discounting is conceptually appealing, naive deep actor--critic implementations can become unstable and degenerate toward TD-error collapse. We propose AdaGamma, a practical deep actor--critic method for state-dependent discounting that learns a state-dependent discount function together with a return-consistency objective to regularize the induced backup structure. On the theory side, we analyze the Bellman operator induced by state-dependent discounting and establish its basic well-posedness properties under suitable conditions. Empirically, AdaGamma integrates into both SAC and PPO, yielding consistent improvements on continuous-control benchmarks, and achieves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
