Guidance Design for Escape Flight Vehicle Using Evolution Strategy Enhanced Deep Reinforcement Learning
Xiao Hu, Tianshu Wang, Min Gong, Shaoshi Yang

TL;DR
This paper introduces a novel guidance design method for escape flight vehicles using a combination of deep reinforcement learning and evolution strategies, achieving higher residual velocities in simulations.
Contribution
It proposes a two-step guidance optimization approach combining PPO and evolution strategies to improve escape vehicle performance.
Findings
PPO-based guidance achieves 67.24 m/s residual velocity.
ES-enhanced PPO improves residual velocity to 69.04 m/s.
The method outperforms benchmark algorithms in simulation.
Abstract
Guidance commands of flight vehicles are a series of data sets with fixed time intervals, thus guidance design constitutes a sequential decision problem and satisfies the basic conditions for using deep reinforcement learning (DRL). In this paper, we consider the scenario where the escape flight vehicle (EFV) generates guidance commands based on DRL and the pursuit flight vehicle (PFV) generates guidance commands based on the proportional navigation method. For the EFV, the objective of the guidance design entails progressively maximizing the residual velocity, subject to the constraint imposed by the given evasion distance. Thus an irregular dynamic max-min problem of extremely large-scale is formulated, where the time instant when the optimal solution can be attained is uncertain and the optimum solution depends on all the intermediate guidance commands generated before. For solving…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsEntropy Regularization · Proximal Policy Optimization
