Reinforcement Learning Trained Observer Control for Bearings-Only Tracking
Branko Ristic, Sanjeev Arulampalam

TL;DR
This paper presents a deep reinforcement learning approach for autonomous bearings-only target tracking, balancing estimation accuracy and filter consistency through a Pareto-optimized reward function.
Contribution
It introduces a novel RL-based observer control policy formulated as a belief MDP, trained with a DQN to optimize tracking performance and robustness.
Findings
The DQN policy with $eta=0.7$ outperforms baselines in accuracy and worst-case error.
The approach achieves comparable mean accuracy to information-theoretic methods.
It reduces worst-case tracking error by nearly a factor of ten.
Abstract
This paper develops a deep reinforcement learning based observer control policy for autonomous bearings-only tracking of a moving target. The observer manoeuvre problem is formulated as a belief Markov decision process, where the belief state is represented by the posterior of a cubature Kalman filter (CKF). The reward function is designed to address two conflicting objectives: minimising the absolute target position estimation error (Euclidean distance) and maintaining CKF estimation consistency (Mahalanobis distance). The reward is formulated as a geometric interpolation between the two objectives on the Pareto front, parametrised by a weighting factor . The policy is implemented as a deep Q-network (DQN) trained over 50,000 episodes. Performance is evaluated over 5,000 Monte Carlo episodes and compared against two baselines: the perpendicular-to-bearing heuristic and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
