Guidance Design for Escape Flight Vehicle Using Evolution Strategy   Enhanced Deep Reinforcement Learning

Xiao Hu; Tianshu Wang; Min Gong; Shaoshi Yang

arXiv:2405.03711·cs.LG·May 8, 2024

Guidance Design for Escape Flight Vehicle Using Evolution Strategy Enhanced Deep Reinforcement Learning

Xiao Hu, Tianshu Wang, Min Gong, Shaoshi Yang

PDF

TL;DR

This paper introduces a novel guidance design method for escape flight vehicles using a combination of deep reinforcement learning and evolution strategies, achieving higher residual velocities in simulations.

Contribution

It proposes a two-step guidance optimization approach combining PPO and evolution strategies to improve escape vehicle performance.

Findings

01

PPO-based guidance achieves 67.24 m/s residual velocity.

02

ES-enhanced PPO improves residual velocity to 69.04 m/s.

03

The method outperforms benchmark algorithms in simulation.

Abstract

Guidance commands of flight vehicles are a series of data sets with fixed time intervals, thus guidance design constitutes a sequential decision problem and satisfies the basic conditions for using deep reinforcement learning (DRL). In this paper, we consider the scenario where the escape flight vehicle (EFV) generates guidance commands based on DRL and the pursuit flight vehicle (PFV) generates guidance commands based on the proportional navigation method. For the EFV, the objective of the guidance design entails progressively maximizing the residual velocity, subject to the constraint imposed by the given evasion distance. Thus an irregular dynamic max-min problem of extremely large-scale is formulated, where the time instant when the optimal solution can be attained is uncertain and the optimum solution depends on all the intermediate guidance commands generated before. For solving…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsEntropy Regularization · Proximal Policy Optimization