Uncertainty Aware Multitask Pyramid Vision Transformer For UAV-Based Object Re-Identification
Syeda Nyma Ferdous, Xin Li, Siwei Lyu

TL;DR
This paper introduces an uncertainty-aware multitask Pyramid Vision Transformer model for UAV-based object re-identification, effectively handling varying camera parameters and intraclass variations in aerial surveillance images.
Contribution
It proposes a novel multiscale Pyramid Vision Transformer architecture combined with uncertainty modeling for improved UAV-based object ReID.
Findings
Effective on PRAI and VRAI datasets
Outperforms existing methods in UAV ReID tasks
Demonstrates robustness to camera parameter variations
Abstract
Object Re-IDentification (ReID), one of the most significant problems in biometrics and surveillance systems, has been extensively studied by image processing and computer vision communities in the past decades. Learning a robust and discriminative feature representation is a crucial challenge for object ReID. The problem is even more challenging in ReID based on Unmanned Aerial Vehicle (UAV) as the images are characterized by continuously varying camera parameters (e.g., view angle, altitude, etc.) of a flying drone. To address this challenge, multiscale feature representation has been considered to characterize images captured from UAV flying at different altitudes. In this work, we propose a multitask learning approach, which employs a new multiscale architecture without convolution, Pyramid Vision Transformer (PVT), as the backbone for UAV-based object ReID. By uncertainty modeling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Image and Video Retrieval Techniques · Infrared Target Detection Methodologies
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam · Softmax · Dropout · Vision Transformer · Residual Connection
