AZTR: Aerial Video Action Recognition with Auto Zoom and Temporal Reasoning
Xijun Wang, Ruiqi Xian, Tianrui Guan, Celso M. de Melo, Stephen M., Nogar, Aniket Bera, Dinesh Manocha

TL;DR
This paper introduces AZTR, a novel aerial video action recognition method utilizing auto zoom and temporal reasoning, optimized for UAV videos on edge devices, achieving significant accuracy improvements over state-of-the-art methods.
Contribution
The paper presents a new approach combining auto zoom and efficient temporal reasoning for UAV video action recognition, suitable for edge and mobile devices.
Findings
Achieves 6.1-7.4% higher Top-1 accuracy on RoCoG-v2 dataset
Improves 8.3-10.4% on UAV-Human dataset
Enhances 3.2% on Drone Action dataset
Abstract
We propose a novel approach for aerial video action recognition. Our method is designed for videos captured using UAVs and can run on edge or mobile devices. We present a learning-based approach that uses customized auto zoom to automatically identify the human target and scale it appropriately. This makes it easier to extract the key features and reduces the computational overhead. We also present an efficient temporal reasoning algorithm to capture the action information along the spatial and temporal domains within a controllable computational cost. Our approach has been implemented and evaluated both on the desktop with high-end GPUs and on the low power Robotics RB5 Platform for robots and drones. In practice, we achieve 6.1-7.4% improvement over SOTA in Top-1 accuracy on the RoCoG-v2 dataset, 8.3-10.4% improvement on the UAV-Human dataset and 3.2% improvement on the Drone Action…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Anomaly Detection Techniques and Applications
