Correcting discount-factor mismatch in on-policy policy gradient methods
Fengdi Che, Gautham Vasan, A. Rupam Mahmood

TL;DR
This paper identifies a discrepancy in on-policy policy gradient methods related to discounting, proposes a novel correction to improve learning stability and performance, and demonstrates its effectiveness on standard benchmarks.
Contribution
It introduces a new distribution correction method for policy gradients that addresses discount factor mismatch, improving stability and performance.
Findings
The correction reduces variance compared to previous methods.
The method improves policy performance on OpenAI gym and DeepMind benchmarks.
It helps avoid suboptimal policies in environments with similar states.
Abstract
The policy gradient theorem gives a convenient form of the policy gradient in terms of three factors: an action value, a gradient of the action likelihood, and a state distribution involving discounting called the \emph{discounted stationary distribution}. But commonly used on-policy methods based on the policy gradient theorem ignores the discount factor in the state distribution, which is technically incorrect and may even cause degenerate learning behavior in some environments. An existing solution corrects this discrepancy by using as a factor in the gradient estimate. However, this solution is not widely adopted and does not work well in tasks where the later states are similar to earlier states. We introduce a novel distribution correction to account for the discounted stationary distribution that can be plugged into many existing gradient estimators. Our correction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAge of Information Optimization · Stochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data
