TL;DR
This paper explores deep learning methods for UAV audio classification using only 4,500 seconds of data, comparing CNNs and transformers, and highlighting the efficiency and potential of each approach under data scarcity conditions.
Contribution
It introduces a small-data training framework for UAV audio classification and compares CNNs and transformers, emphasizing parameter efficiency and future potential.
Findings
CNNs outperform transformers by 1-2% accuracy
CNNs are more computationally efficient
Transformers show potential with more data and optimization
Abstract
Unmanned aerial vehicle (UAV) usage is expected to surge in the coming decade, raising the need for heightened security measures to prevent airspace violations and security threats. This study investigates deep learning approaches to UAV classification focusing on the key issue of data scarcity. To investigate this we opted to train the models using a total of 4,500 seconds of audio samples, evenly distributed across a 9-class dataset. We leveraged parameter efficient fine-tuning (PEFT) and data augmentations to mitigate the data scarcity. This paper implements and compares the use of convolutional neural networks (CNNs) and attention-based transformers. Our results show that, CNNs outperform transformers by 1-2\% accuracy, while still being more computationally efficient. These early findings, however, point to potential in using transformers models; suggesting that with more data and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
