Non-Parametric Stochastic Policy Gradient with Strategic Retreat for   Non-Stationary Environment

Apan Dastider; Mingjie Lin

arXiv:2203.14905·cs.RO·March 29, 2022

Non-Parametric Stochastic Policy Gradient with Strategic Retreat for Non-Stationary Environment

Apan Dastider, Mingjie Lin

PDF

Open Access

TL;DR

This paper introduces a non-parametric, kernel-based policy learning method that adapts to non-stationary environments in robotics, outperforming traditional parametric methods like DDPG and TD3.

Contribution

It proposes a novel non-parametric kernel-based approach with adaptive windowing for dynamic environment adaptation, enhancing control policy learning in robotics.

Findings

01

Outperforms DDPG and TD3 in dynamic environments

02

Effective in high-dimensional RKHS spaces

03

Validated on multiple benchmarks with superior results

Abstract

In modern robotics, effectively computing optimal control policies under dynamically varying environments poses substantial challenges to the off-the-shelf parametric policy gradient methods, such as the Deep Deterministic Policy Gradient (DDPG) and Twin Delayed Deep Deterministic policy gradient (TD3). In this paper, we propose a systematic methodology to dynamically learn a sequence of optimal control policies non-parametrically, while autonomously adapting with the constantly changing environment dynamics. Specifically, our non-parametric kernel-based methodology embeds a policy distribution as the features in a non-decreasing Euclidean space, therefore allowing its search space to be defined as a very high (possible infinite) dimensional RKHS (Reproducing Kernel Hilbert Space). Moreover, by leveraging the similarity metric computed in RKHS, we augmented our non-parametric learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning

MethodsTarget Policy Smoothing · Clipped Double Q-learning · *Communicated@Fast*How Do I Communicate to Expedia? · Twin Delayed Deep Deterministic · Dense Connections · Convolution · Batch Normalization · Experience Replay · Weight Decay · Adam