Instance-level Visual Active Tracking with Occlusion-Aware Planning

Haowei Sun; Kai Zhou; Hao Gao; Shiteng Zhang; Jinwu Hu; Xutao Wen; Qixiang Ye; Mingkui Tan

arXiv:2604.21453·cs.CV·April 24, 2026

Instance-level Visual Active Tracking with Occlusion-Aware Planning

Haowei Sun, Kai Zhou, Hao Gao, Shiteng Zhang, Jinwu Hu, Xutao Wen, Qixiang Ye, Mingkui Tan

PDF

TL;DR

This paper introduces OA-VAT, a comprehensive system for instance-level visual active tracking that effectively handles distractors and occlusions through novel modules and a new dataset, achieving state-of-the-art results in real-time scenarios.

Contribution

The paper presents a unified pipeline with three modules: instance-aware prototype initialization, online prototype enhancement, and occlusion-aware trajectory planning, advancing active tracking capabilities.

Findings

01

Achieves 0.93 SR on UnrealCV, outperforming SOTA by 2.2%.

02

Attains 90.8% CAR on real-world datasets, surpassing SOTA by 12.1%.

03

Runs at 35 FPS on RTX 3090, enabling real-time deployment.

Abstract

Visual Active Tracking (VAT) aims to control cameras to follow a target in 3D space, which is critical for applications like drone navigation and security surveillance. However, it faces two key bottlenecks in real-world deployment: confusion from visually similar distractors caused by insufficient instance-level discrimination and severe failure under occlusions due to the absence of active planning. To address these, we propose OA-VAT, a unified pipeline with three complementary modules. First, a training-free Instance-Aware Offline Prototype Initialization aggregates multi-view augmented features via DINOv3 to construct discriminative instance prototypes, mitigating distractor confusion. Second, an Online Prototype Enhancement Tracker enhances prototypes online and integrates a confidence-aware Kalman filter for stable tracking under appearance and motion changes. Third, an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.