# Interplanetary Transfers via Deep Representations of the Optimal Policy   and/or of the Value Function

**Authors:** Dario Izzo, Ekin \"Ozt\"urk, Marcus M\"artens

arXiv: 1904.08809 · 2019-12-18

## TL;DR

This paper presents a novel method to generate extensive datasets of optimal interplanetary trajectories from a single nominal path, enabling efficient training of deep neural networks for spacecraft guidance without solving multiple optimal control problems.

## Contribution

The authors introduce a method to rapidly generate millions of optimal trajectories from one nominal trajectory, facilitating deep learning applications in interplanetary trajectory optimization.

## Key findings

- Policy imitation and value function gradient learning effectively learn the optimal feedback policy.
- Value function learning captures the final propellant mass but not the full policy.
- The approach reduces computational effort in generating training data for deep learning models.

## Abstract

A number of applications to interplanetary trajectories have been recently proposed based on deep networks. These approaches often rely on the availability of a large number of optimal trajectories to learn from. In this paper we introduce a new method to quickly create millions of optimal spacecraft trajectories from a single nominal trajectory. Apart from the generation of the nominal trajectory, no additional optimal control problems need to be solved as all the trajectories, by construction, satisfy Pontryagin's minimum principle and the relevant transversality conditions. We then consider deep feed forward neural networks and benchmark three learning methods on the created dataset: policy imitation, value function learning and value function gradient learning. Our results are shown for the case of the interplanetary trajectory optimization problem of reaching Venus orbit, with the nominal trajectory starting from the Earth. We find that both policy imitation and value function gradient learning are able to learn the optimal state feedback, while in the case of value function learning the optimal policy is not captured, only the final value of the optimal propellant mass is.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.08809/full.md

## Figures

17 figures with captions in the complete paper: https://tomesphere.com/paper/1904.08809/full.md

## References

14 references — full list in the complete paper: https://tomesphere.com/paper/1904.08809/full.md

---
Source: https://tomesphere.com/paper/1904.08809