# Exposure-Based Multi-Agent Inspection of a Tumbling Target Using Deep   Reinforcement Learning

**Authors:** Joshua Aurand, Steven Cutlip, Henry Lei, Kendra Lang, and Sean, Phillips

arXiv: 2302.14188 · 2023-05-02

## TL;DR

This paper introduces a hierarchical, deep reinforcement learning-based multi-agent approach for autonomous on-orbit inspection of tumbling targets, achieving high coverage without continuous ground control or attitude adjustments.

## Contribution

It presents a novel decentralized planning framework combining high-level RL-based viewpoint selection with low-level navigation, extendable to unknown geometries and sensor inputs.

## Key findings

- Successfully inspects over 90% of non-convex tumbling targets
- Operates effectively with limited information and without attitude control
- Demonstrates robustness in complex, unpredictable environments

## Abstract

As space becomes more congested, on orbit inspection is an increasingly relevant activity whether to observe a defunct satellite for planning repairs or to de-orbit it. However, the task of on orbit inspection itself is challenging, typically requiring the careful coordination of multiple observer satellites. This is complicated by a highly nonlinear environment where the target may be unknown or moving unpredictably without time for continuous command and control from the ground. There is a need for autonomous, robust, decentralized solutions to the inspection task. To achieve this, we consider a hierarchical, learned approach for the decentralized planning of multi-agent inspection of a tumbling target. Our solution consists of two components: a viewpoint or high-level planner trained using deep reinforcement learning and a navigation planner handling point-to-point navigation between pre-specified viewpoints. We present a novel problem formulation and methodology that is suitable not only to reinforcement learning-derived robust policies, but extendable to unknown target geometries and higher fidelity information theoretic objectives received directly from sensor inputs. Operating under limited information, our trained multi-agent high-level policies successfully contextualize information within the global hierarchical environment and are correspondingly able to inspect over 90% of non-convex tumbling targets, even in the absence of additional agent attitude control.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2302.14188/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/2302.14188/full.md

## References

44 references — full list in the complete paper: https://tomesphere.com/paper/2302.14188/full.md

---
Source: https://tomesphere.com/paper/2302.14188