Co-Training an Observer and an Evading Target

Andr\'e Brandenburger; Folker Hoffmann; Alexander Charlish

arXiv:2210.11126·cs.RO·October 21, 2022

Co-Training an Observer and an Evading Target

Andr\'e Brandenburger, Folker Hoffmann, Alexander Charlish

PDF

TL;DR

This paper introduces a multi-agent reinforcement learning approach using PPO to improve UAV sensor management by jointly generating adaptive protagonist and antagonist policies, outperforming baselines and enhancing explainability.

Contribution

It presents a novel multi-agent RL framework for UAV tracking that generates on-the-fly policies and incorporates XAI techniques for interpretability.

Findings

01

Outperforms baseline methods in UAV tracking tasks

02

Generates adaptive policies that improve robustness

03

Provides interpretable decision models like feature saliency and decision trees

Abstract

Reinforcement learning (RL) is already widely applied to applications such as robotics, but it is only sparsely used in sensor management. In this paper, we apply the popular Proximal Policy Optimization (PPO) approach to a multi-agent UAV tracking scenario. While recorded data of real scenarios can accurately reflect the real world, the required amount of data is not always available. Simulation data, however, is typically cheap to generate, but the utilized target behavior is often naive and only vaguely represents the real world. In this paper, we utilize multi-agent RL to jointly generate protagonistic and antagonistic policies and overcome the data generation problem, as the policies are generated on-the-fly and adapt continuously. This way, we are able to clearly outperform baseline methods and robustly generate competitive policies. In addition, we investigate explainable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.