Survival of the Fittest: Evolutionary Adaptation of Policies for Environmental Shifts
Sheryl Paul, Jyotirmoy V. Deshmukh

TL;DR
This paper introduces ERPO, an evolutionary-inspired algorithm that adaptively retrains policies for environments with drastic distribution shifts, outperforming traditional RL methods in pathfinding tasks.
Contribution
ERPO is a novel adaptive re-training algorithm based on evolutionary game theory, enabling efficient policy adaptation to significant environmental distribution shifts.
Findings
ERPO converges to optimal policies under common reward sparsity assumptions.
ERPO outperforms PPO, A3C, DQN in various pathfinding environments.
ERPO achieves faster adaptation, higher rewards, and lower computational costs.
Abstract
Reinforcement learning (RL) has been successfully applied to solve the problem of finding obstacle-free paths for autonomous agents operating in stochastic and uncertain environments. However, when the underlying stochastic dynamics of the environment experiences drastic distribution shifts, the optimal policy obtained in the trained environment may be sub-optimal or may entirely fail in helping find goal-reaching paths for the agent. Approaches like domain randomization and robust RL can provide robust policies, but typically assume minor (bounded) distribution shifts. For substantial distribution shifts, retraining (either with a warm-start policy or from scratch) is an alternative approach. In this paper, we develop a novel approach called {\em Evolutionary Robust Policy Optimization} (ERPO), an adaptive re-training algorithm inspired by evolutionary game theory (EGT). ERPO learns an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDense Connections · Convolution · Entropy Regularization · Edge-augmented Graph Transformer · Softmax · A3C
