Mixed Policy Gradient: off-policy reinforcement learning driven jointly   by data and model

Yang Guan; Jingliang Duan; Shengbo Eben Li; Jie Li; Jianyu Chen; Bo; Cheng

arXiv:2102.11513·cs.LG·February 27, 2024

Mixed Policy Gradient: off-policy reinforcement learning driven jointly by data and model

Yang Guan, Jingliang Duan, Shengbo Eben Li, Jie Li, Jianyu Chen, Bo, Cheng

PDF

Open Access 2 Repos

TL;DR

This paper introduces the mixed policy gradient (MPG) algorithm that combines data-driven and model-driven policy gradients to improve convergence speed in reinforcement learning without sacrificing performance.

Contribution

The paper proposes a novel MPG algorithm that fuses data and model-based policy gradients with a heuristic weight adjustment scheme for faster convergence.

Findings

01

MPG achieves faster convergence than baseline algorithms.

02

MPG attains superior asymptotic performance.

03

Heuristic weight adjustment improves learning efficiency.

Abstract

Reinforcement learning (RL) shows great potential in sequential decision-making. At present, mainstream RL algorithms are data-driven, which usually yield better asymptotic performance but much slower convergence compared with model-driven methods. This paper proposes mixed policy gradient (MPG) algorithm, which fuses the empirical data and the transition model in policy gradient (PG) to accelerate convergence without performance degradation. Formally, MPG is constructed as a weighted average of the data-driven and model-driven PGs, where the former is the derivative of the learned Q-value function, and the latter is that of the model-predictive return. To guide the weight design, we analyze and compare the upper bound of each PG error. Relying on that, a rule-based method is employed to heuristically adjust the weights. In particular, to get a better PG, the weight of the data-driven…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPolicy Transfer and Learning