RELO: Reinforcement Learning to Localize for Visual Object Tracking

Xin Chen; Chuanyu Sun; Jiao Xu; Houwen Peng; Dong Wang; Huchuan Lu; Kede Ma

arXiv:2605.07379·cs.CV·May 20, 2026

RELO: Reinforcement Learning to Localize for Visual Object Tracking

Xin Chen, Chuanyu Sun, Jiao Xu, Houwen Peng, Dong Wang, Huchuan Lu, Kede Ma

PDF

TL;DR

RELO introduces a reinforcement learning-based approach to target localization in visual object tracking, replacing handcrafted priors with learned policies that optimize IoU and AUC metrics, leading to superior benchmark performance.

Contribution

The paper proposes RELO, a novel reinforcement learning framework for visual object tracking that learns localization policies directly optimized for tracking metrics.

Findings

01

RELO achieves 57.5% AUC on LaSOText without template updates.

02

RELO outperforms prior methods on multiple benchmarks.

03

Layer-aligned temporal token propagation enhances semantic consistency across frames.

Abstract

Conventional visual object trackers localize targets using handcrafted spatial priors, often in the form of heatmaps. Such priors provide only surrogate supervision and are poorly aligned with tracking optimization and evaluation metrics, such as intersection over union (IoU) and area under the success curve (AUC). Here, we introduce RELO, a REinforcement-learning-to-LOcalize method for visual object tracking that formulates target localization as a Markov decision process. Specifically, RELO replaces handcrafted spatial priors with a localization policy learned over spatial positions via reinforcement learning, with rewards combining frame-level IoU and sequence-level AUC. We additionally introduce layer-aligned temporal token propagation to improve semantic consistency across frames, with negligible computational overhead. Across multiple benchmarks, RELO achieves superior results,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.