Real-Time Flying Object Detection with YOLOv8
Dillon Reis, Jordan Kupec, Jacqueline Hong, Ahmad Daoudi

TL;DR
This paper develops a real-time flying object detection model using YOLOv8, achieving high accuracy and speed through transfer learning and a generalized training approach, suitable for real-world applications.
Contribution
The paper introduces a generalized YOLOv8-based model trained on diverse flying objects and refines it via transfer learning for improved real-world detection performance.
Findings
Refined model achieves 99.1% mAP50 accuracy.
Generalized model reaches 79.2% mAP50 with 50 fps.
Effective handling of occlusion and small object detection.
Abstract
This paper presents a generalized model for real-time detection of flying objects that can be used for transfer learning and further research, as well as a refined model that achieves state-of-the-art results for flying object detection. We achieve this by training our first (generalized) model on a data set containing 40 different classes of flying objects, forcing the model to extract abstract feature representations. We then perform transfer learning with these learned parameters on a data set more representative of real world environments (i.e. higher frequency of occlusion, very small spatial sizes, rotations, etc.) to generate our refined model. Object detection of flying objects remains challenging due to large variances of object spatial sizes/aspect ratios, rate of speed, occlusion, and clustered backgrounds. To address some of the presented challenges while simultaneously…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Video Surveillance and Tracking Methods · Human Pose and Action Recognition
MethodsYou Only Look Once · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
