Reinforcement Learning Trained Observer Control for Bearings-Only Tracking

Branko Ristic; Sanjeev Arulampalam

arXiv:2605.02120·cs.AI·May 5, 2026

Reinforcement Learning Trained Observer Control for Bearings-Only Tracking

Branko Ristic, Sanjeev Arulampalam

PDF

TL;DR

This paper presents a deep reinforcement learning approach for autonomous bearings-only target tracking, balancing estimation accuracy and filter consistency through a Pareto-optimized reward function.

Contribution

It introduces a novel RL-based observer control policy formulated as a belief MDP, trained with a DQN to optimize tracking performance and robustness.

Findings

01

The DQN policy with $eta=0.7$ outperforms baselines in accuracy and worst-case error.

02

The approach achieves comparable mean accuracy to information-theoretic methods.

03

It reduces worst-case tracking error by nearly a factor of ten.

Abstract

This paper develops a deep reinforcement learning based observer control policy for autonomous bearings-only tracking of a moving target. The observer manoeuvre problem is formulated as a belief Markov decision process, where the belief state is represented by the posterior of a cubature Kalman filter (CKF). The reward function is designed to address two conflicting objectives: minimising the absolute target position estimation error (Euclidean distance) and maintaining CKF estimation consistency (Mahalanobis distance). The reward is formulated as a geometric interpolation between the two objectives on the Pareto front, parametrised by a weighting factor $β \in [0, 1]$ . The policy is implemented as a deep Q-network (DQN) trained over 50,000 episodes. Performance is evaluated over 5,000 Monte Carlo episodes and compared against two baselines: the perpendicular-to-bearing heuristic and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.