Survival of the Fittest: Evolutionary Adaptation of Policies for   Environmental Shifts

Sheryl Paul; Jyotirmoy V. Deshmukh

arXiv:2410.19852·cs.LG·October 29, 2024

Survival of the Fittest: Evolutionary Adaptation of Policies for Environmental Shifts

Sheryl Paul, Jyotirmoy V. Deshmukh

PDF

TL;DR

This paper introduces ERPO, an evolutionary-inspired algorithm that adaptively retrains policies for environments with drastic distribution shifts, outperforming traditional RL methods in pathfinding tasks.

Contribution

ERPO is a novel adaptive re-training algorithm based on evolutionary game theory, enabling efficient policy adaptation to significant environmental distribution shifts.

Findings

01

ERPO converges to optimal policies under common reward sparsity assumptions.

02

ERPO outperforms PPO, A3C, DQN in various pathfinding environments.

03

ERPO achieves faster adaptation, higher rewards, and lower computational costs.

Abstract

Reinforcement learning (RL) has been successfully applied to solve the problem of finding obstacle-free paths for autonomous agents operating in stochastic and uncertain environments. However, when the underlying stochastic dynamics of the environment experiences drastic distribution shifts, the optimal policy obtained in the trained environment may be sub-optimal or may entirely fail in helping find goal-reaching paths for the agent. Approaches like domain randomization and robust RL can provide robust policies, but typically assume minor (bounded) distribution shifts. For substantial distribution shifts, retraining (either with a warm-start policy or from scratch) is an alternative approach. In this paper, we develop a novel approach called {\em Evolutionary Robust Policy Optimization} (ERPO), an adaptive re-training algorithm inspired by evolutionary game theory (EGT). ERPO learns an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsDense Connections · Convolution · Entropy Regularization · Edge-augmented Graph Transformer · Softmax · A3C