Optimal Control-Based Baseline for Guided Exploration in Policy Gradient   Methods

Xubo Lyu; Site Li; Seth Siriya; Ye Pu; Mo Chen

arXiv:2011.02073·cs.LG·November 7, 2024

Optimal Control-Based Baseline for Guided Exploration in Policy Gradient Methods

Xubo Lyu, Site Li, Seth Siriya, Ye Pu, Mo Chen

PDF

Open Access

TL;DR

This paper introduces an optimal control-based baseline for policy gradient methods in deep reinforcement learning, enhancing exploration especially in sparse reward settings by leveraging an optimal control value function.

Contribution

It presents a novel baseline derived from an optimal control problem, shifting the role from variance reduction to guiding exploration in policy learning.

Findings

01

Effective in sparse reward environments

02

Improves exploration during policy learning

03

Validated on robot learning tasks

Abstract

In this paper, a novel optimal control-based baseline function is presented for the policy gradient method in deep reinforcement learning (RL). The baseline is obtained by computing the value function of an optimal control problem, which is formed to be closely associated with the RL task. In contrast to the traditional baseline aimed at variance reduction of policy gradient estimates, our work utilizes the optimal control value function to introduce a novel aspect to the role of baseline -- providing guided exploration during policy learning. This aspect is less discussed in prior works. We validate our baseline on robot learning tasks, showing its effectiveness in guided exploration, particularly in sparse reward environments.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Control Systems Optimization · Formal Methods in Verification