Policy Optimization with Smooth Guidance Learned from State-Only   Demonstrations

Guojian Wang; Faguo Wu; Xiao Zhang; Tianyuan Chen

arXiv:2401.00162·cs.LG·October 28, 2024·1 cites

Policy Optimization with Smooth Guidance Learned from State-Only Demonstrations

Guojian Wang, Faguo Wu, Xiao Zhang, Tianyuan Chen

PDF

Open Access

TL;DR

This paper introduces POSG, a novel reinforcement learning algorithm that uses state-only demonstrations to improve learning efficiency and control performance in sparse-reward environments, reducing reliance on high-quality action data.

Contribution

The paper proposes POSG, an efficient method leveraging state-only demonstrations with a trajectory importance mechanism to guide policy optimization in sparse-reward settings.

Findings

01

POSG outperforms baselines in control performance.

02

Faster convergence in four benchmark environments.

03

Effective use of state-only demonstrations for guidance.

Abstract

The sparsity of reward feedback remains a challenging problem in online deep reinforcement learning (DRL). Previous approaches have utilized offline demonstrations to achieve impressive results in multiple hard tasks. However, these approaches place high demands on demonstration quality, and obtaining expert-like actions is often costly and unrealistic. To tackle these problems, we propose a simple and efficient algorithm called Policy Optimization with Smooth Guidance (POSG), which leverages a small set of state-only demonstrations (where expert action information is not included in demonstrations) to indirectly make approximate and feasible long-term credit assignments and facilitate exploration. Specifically, we first design a trajectory-importance evaluation mechanism to determine the quality of the current trajectory against demonstrations. Then, we introduce a guidance reward…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning

MethodsSparse Evolutionary Training · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings