Comparing Behavioural Cloning and Reinforcement Learning for Spacecraft Guidance and Control Networks
Harry Holt, Sebastien Origer, Dario Izzo

TL;DR
This paper systematically compares behavioural cloning and reinforcement learning for training guidance and control networks in spacecraft trajectory optimization, highlighting their respective strengths and limitations.
Contribution
It introduces a novel RL training framework for G&CNETs and provides the first direct comparison between BC and RL in this context.
Findings
BC excels at mimicking expert policies but depends on training data quality.
RL demonstrates better adaptability and can find globally optimal solutions.
RL can outperform suboptimal expert demonstrations in trajectory optimization.
Abstract
Guidance & control networks (G&CNETs) provide a promising alternative to on-board guidance and control (G&C) architectures for spacecraft, offering a differentiable, end-to-end representation of the guidance and control architecture. When training G&CNETs, two predominant paradigms emerge: behavioural cloning (BC), which mimics optimal trajectories, and reinforcement learning (RL), which learns optimal behaviour through trials and errors. Although both approaches have been adopted in G&CNET related literature, direct comparisons are notably absent. To address this, we conduct a systematic evaluation of BC and RL specifically for training G&CNETs on continuous-thrust spacecraft trajectory optimisation tasks. We introduce a novel RL training framework tailored to G&CNETs, incorporating decoupled action and control frequencies alongside reward redistribution strategies to stabilise…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpacecraft Dynamics and Control · Space Satellite Systems and Control · Adaptive Dynamic Programming Control
