Instance-level Visual Active Tracking with Occlusion-Aware Planning
Haowei Sun, Kai Zhou, Hao Gao, Shiteng Zhang, Jinwu Hu, Xutao Wen, Qixiang Ye, Mingkui Tan

TL;DR
This paper introduces OA-VAT, a comprehensive system for instance-level visual active tracking that effectively handles distractors and occlusions through novel modules and a new dataset, achieving state-of-the-art results in real-time scenarios.
Contribution
The paper presents a unified pipeline with three modules: instance-aware prototype initialization, online prototype enhancement, and occlusion-aware trajectory planning, advancing active tracking capabilities.
Findings
Achieves 0.93 SR on UnrealCV, outperforming SOTA by 2.2%.
Attains 90.8% CAR on real-world datasets, surpassing SOTA by 12.1%.
Runs at 35 FPS on RTX 3090, enabling real-time deployment.
Abstract
Visual Active Tracking (VAT) aims to control cameras to follow a target in 3D space, which is critical for applications like drone navigation and security surveillance. However, it faces two key bottlenecks in real-world deployment: confusion from visually similar distractors caused by insufficient instance-level discrimination and severe failure under occlusions due to the absence of active planning. To address these, we propose OA-VAT, a unified pipeline with three complementary modules. First, a training-free Instance-Aware Offline Prototype Initialization aggregates multi-view augmented features via DINOv3 to construct discriminative instance prototypes, mitigating distractor confusion. Second, an Online Prototype Enhancement Tracker enhances prototypes online and integrates a confidence-aware Kalman filter for stable tracking under appearance and motion changes. Third, an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
