The Solution for the GAIIC2024 RGB-TIR object detection Challenge
Xiangyu Wu, Jinling Xu, Longfei Huang, Yang Yang

TL;DR
This paper presents a lightweight YOLOv9-based solution with multi-level auxiliary branches and a feature-level fusion module for RGB-TIR object detection in UAV scenarios, addressing complex backgrounds and calibration issues.
Contribution
It introduces a novel lightweight YOLOv9-based model with multi-level auxiliary branches and a feature fusion module for improved RGB-TIR object detection in UAVs.
Findings
Achieved mAP scores of 0.516 and 0.543 on benchmarks A and B.
Maintained the highest inference speed among competing models.
Enhanced robustness to complex backgrounds and calibration issues.
Abstract
This report introduces a solution to The task of RGB-TIR object detection from the perspective of unmanned aerial vehicles. Unlike traditional object detection methods, RGB-TIR object detection aims to utilize both RGB and TIR images for complementary information during detection. The challenges of RGB-TIR object detection from the perspective of unmanned aerial vehicles include highly complex image backgrounds, frequent changes in lighting, and uncalibrated RGB-TIR image pairs. To address these challenges at the model level, we utilized a lightweight YOLOv9 model with extended multi-level auxiliary branches that enhance the model's robustness, making it more suitable for practical applications in unmanned aerial vehicle scenarios. For image fusion in RGB-TIR detection, we incorporated a fusion module into the backbone network to fuse images at the feature level, implicitly addressing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInfrared Target Detection Methodologies
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
