Co-Training an Observer and an Evading Target
Andr\'e Brandenburger, Folker Hoffmann, Alexander Charlish

TL;DR
This paper introduces a multi-agent reinforcement learning approach using PPO to improve UAV sensor management by jointly generating adaptive protagonist and antagonist policies, outperforming baselines and enhancing explainability.
Contribution
It presents a novel multi-agent RL framework for UAV tracking that generates on-the-fly policies and incorporates XAI techniques for interpretability.
Findings
Outperforms baseline methods in UAV tracking tasks
Generates adaptive policies that improve robustness
Provides interpretable decision models like feature saliency and decision trees
Abstract
Reinforcement learning (RL) is already widely applied to applications such as robotics, but it is only sparsely used in sensor management. In this paper, we apply the popular Proximal Policy Optimization (PPO) approach to a multi-agent UAV tracking scenario. While recorded data of real scenarios can accurately reflect the real world, the required amount of data is not always available. Simulation data, however, is typically cheap to generate, but the utilized target behavior is often naive and only vaguely represents the real world. In this paper, we utilize multi-agent RL to jointly generate protagonistic and antagonistic policies and overcome the data generation problem, as the policies are generated on-the-fly and adapt continuously. This way, we are able to clearly outperform baseline methods and robustly generate competitive policies. In addition, we investigate explainable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
