DARTer: Dynamic Adaptive Representation Tracker for Nighttime UAV Tracking
Xuzhao Li, Xuchen Li, Shiyu Hu

TL;DR
DARTer is an end-to-end nighttime UAV tracking framework that adaptively fuses multi-perspective features and activates transformer layers for improved robustness and efficiency in challenging illumination and viewpoint conditions.
Contribution
The paper introduces DARTer, a novel tracker that employs dynamic feature fusion and adaptive transformer activation, reducing computational costs while enhancing tracking performance in nighttime UAV scenarios.
Findings
Outperforms state-of-the-art trackers on multiple benchmarks.
Effectively balances tracking accuracy and computational efficiency.
Eliminates complex multi-task loss functions for streamlined training.
Abstract
Nighttime UAV tracking presents significant challenges due to extreme illumination variations and viewpoint changes, which severely degrade tracking performance. Existing approaches either rely on light enhancers with high computational costs or introduce redundant domain adaptation mechanisms, failing to fully utilize the dynamic features in varying perspectives. To address these issues, we propose \textbf{DARTer} (\textbf{D}ynamic \textbf{A}daptive \textbf{R}epresentation \textbf{T}racker), an end-to-end tracking framework designed for nighttime UAV scenarios. DARTer leverages a Dynamic Feature Blender (DFB) to effectively fuse multi-perspective nighttime features from static and dynamic templates, enhancing representation robustness. Meanwhile, a Dynamic Feature Activator (DFA) adaptively activates Vision Transformer layers based on extracted features, significantly improving…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Vision and Imaging · Robotics and Sensor-Based Localization
MethodsAttention Is All You Need · RoIAlign · Linear Layer · Multi-Head Attention · Dense Connections · Layer Normalization · RoIPool · Byte Pair Encoding · Label Smoothing · Adam
