Infra-YOLO: Efficient Neural Network Structure with Model Compression for Real-Time Infrared Small Object Detection
Zhonglin Chen, Anyu Geng, Jianan Jiang, Jiwu Lu, Di Wu

TL;DR
Infra-YOLO is an optimized neural network model that enhances infrared small object detection by integrating novel modules, a new dataset, and model compression techniques for real-time UAV applications.
Contribution
The paper introduces Infra-YOLO, combining MSAM and FFAFPM modules, a new dataset InfraTiny, and model pruning, advancing infrared small object detection and deployment on embedded devices.
Findings
Infra-YOLO improves [email protected] by 2.7% over YOLOv3.
Infra-YOLO achieves 88% parameter reduction with a slight accuracy gain.
MSAM and FFAFPM significantly enhance detection performance.
Abstract
Although convolutional neural networks have made outstanding achievements in visible light target detection, there are still many challenges in infrared small object detection because of the low signal-to-noise ratio, incomplete object structure, and a lack of reliable infrared small object dataset. To resolve limitations of the infrared small object dataset, a new dataset named InfraTiny was constructed, and more than 85% bounding box is less than 32x32 pixels (3218 images and a total of 20,893 bounding boxes). A multi-scale attention mechanism module (MSAM) and a Feature Fusion Augmentation Pyramid Module (FFAFPM) were proposed and deployed onto embedded devices. The MSAM enables the network to obtain scale perception information by acquiring different receptive fields, while the background noise information is suppressed to enhance feature extraction ability. The proposed FFAFPM can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInfrared Target Detection Methodologies
MethodsSoftmax · Attention Is All You Need · Pruning · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
