DeTrack: A Benchmark and Altitude-Aware Dual World Model for Drone-embodied Tracking
Guyue Hu, Haoming Liu, Siyuan Song, Chenglong Li, Feng Chen, and Jin Tang

TL;DR
This paper introduces DeTrack, a new drone-embodied tracking benchmark with over 11,000 trajectories, and proposes AaDWorlds, an altitude-aware dual world model framework that enhances drone tracking in 3D environments.
Contribution
The paper presents a large-scale drone-embodied tracking benchmark and a novel altitude-aware dual world model framework for improved active drone tracking.
Findings
AaDWorlds improves tracking accuracy across diverse scenes.
The benchmark enables comprehensive evaluation of drone tracking methods.
Altitude-aware modeling reduces conflicts between visibility and safety.
Abstract
Aerial object tracking has broad applications in public safety, emergency rescue, wildlife monitoring, and related fields. However, existing aerial tracking benchmarks are mainly based on passive 2D video sequences captured from fixed camera locations or predefined flight paths, where drones are treated as passive cameras rather than embodied agents that actively perceive, interact, and control their motion in dynamic 3D scenes. In this paper, we define a new drone-embodied tracking task, termed DeTrack, which requires a drone to track a target in interactive 3D environments using online egocentric observations and active flight control in a closed loop. We build a large-scale benchmark containing 11,368 target trajectories across diverse scenes, rendering conditions, semantic regions, and moving distractors, together with evaluation metrics for target visibility, tracking accuracy, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
