Non-Parametric Stochastic Policy Gradient with Strategic Retreat for Non-Stationary Environment
Apan Dastider, Mingjie Lin

TL;DR
This paper introduces a non-parametric, kernel-based policy learning method that adapts to non-stationary environments in robotics, outperforming traditional parametric methods like DDPG and TD3.
Contribution
It proposes a novel non-parametric kernel-based approach with adaptive windowing for dynamic environment adaptation, enhancing control policy learning in robotics.
Findings
Outperforms DDPG and TD3 in dynamic environments
Effective in high-dimensional RKHS spaces
Validated on multiple benchmarks with superior results
Abstract
In modern robotics, effectively computing optimal control policies under dynamically varying environments poses substantial challenges to the off-the-shelf parametric policy gradient methods, such as the Deep Deterministic Policy Gradient (DDPG) and Twin Delayed Deep Deterministic policy gradient (TD3). In this paper, we propose a systematic methodology to dynamically learn a sequence of optimal control policies non-parametrically, while autonomously adapting with the constantly changing environment dynamics. Specifically, our non-parametric kernel-based methodology embeds a policy distribution as the features in a non-decreasing Euclidean space, therefore allowing its search space to be defined as a very high (possible infinite) dimensional RKHS (Reproducing Kernel Hilbert Space). Moreover, by leveraging the similarity metric computed in RKHS, we augmented our non-parametric learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
MethodsTarget Policy Smoothing · Clipped Double Q-learning · *Communicated@Fast*How Do I Communicate to Expedia? · Twin Delayed Deep Deterministic · Dense Connections · Convolution · Batch Normalization · Experience Replay · Weight Decay · Adam
