Variance Reduction based Partial Trajectory Reuse to Accelerate Policy   Gradient Optimization

Hua Zheng; Wei Xie

arXiv:2205.02976·cs.LG·June 22, 2022·1 cites

Variance Reduction based Partial Trajectory Reuse to Accelerate Policy Gradient Optimization

Hua Zheng, Wei Xie

PDF

Open Access 1 Repo

TL;DR

This paper introduces a variance reduction technique that reuses partial trajectories through importance sampling to improve policy gradient optimization in infinite-horizon MDPs, enabling faster convergence and better performance.

Contribution

It proposes a novel variance reduction experience replay method that selectively reuses relevant partial trajectories using mixture likelihood ratios, enhancing policy learning efficiency.

Findings

01

Improved convergence speed of policy optimization algorithms.

02

Enhanced performance of actor-critic and proximal policy optimization methods.

03

Effective reuse of partial trajectories in low-data and online settings.

Abstract

Built on our previous study on green simulation assisted policy gradient (GS-PG) focusing on trajectory-based reuse, in this paper, we consider infinite-horizon Markov Decision Processes and create a new importance sampling based policy gradient optimization approach to support dynamic decision making. The existing GS-PG method was designed to learn from complete episodes or process trajectories, which limits its applicability to low-data situations and flexible online process control. To overcome this limitation, the proposed approach can selectively reuse the most related partial trajectories, i.e., the reuse unit is based on per-step or per-decision historical observations. In specific, we create a mixture likelihood ratio (MLR) based policy gradient optimization that can leverage the information from historical state-action transitions generated under different behavioral policies.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhenghuazx/vrer_policy_optimization
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEnergy, Environment, and Transportation Policies · Energy Efficiency and Management

MethodsExperience Replay