COXNet: Cross-Layer Fusion with Adaptive Alignment and Scale Integration for RGBT Tiny Object Detection

Peiran Peng; Tingfa Xu; Liqiang Song; Mengqi Zhu; Yuqiang Fang; Jianan Li

arXiv:2508.09533·cs.CV·April 14, 2026

COXNet: Cross-Layer Fusion with Adaptive Alignment and Scale Integration for RGBT Tiny Object Detection

Peiran Peng, Tingfa Xu, Liqiang Song, Mengqi Zhu, Yuqiang Fang, Jianan Li

PDF

TL;DR

COXNet is a novel RGBT tiny object detection framework that effectively fuses multimodal features, corrects spatial misalignments, and improves localization, significantly outperforming existing methods.

Contribution

Introduces COXNet with cross-layer fusion, dynamic alignment, and an optimized label strategy for enhanced RGBT tiny object detection.

Findings

01

Achieves 3.32% mAP improvement on RGBTDronePerson dataset.

02

Effectively fuses high-level visible and low-level thermal features.

03

Corrects spatial misalignments and preserves multi-scale features.

Abstract

Detecting tiny objects in multimodal Red-Green-Blue-Thermal (RGBT) imagery is a critical challenge in computer vision, particularly in surveillance, search and rescue, and autonomous navigation. Drone-based scenarios exacerbate these challenges due to spatial misalignment, low-light conditions, occlusion, and cluttered backgrounds. Current methods struggle to leverage the complementary information between visible and thermal modalities effectively. We propose COXNet, a novel framework for RGBT tiny object detection, addressing these issues through three core innovations: i) the Cross-Layer Fusion Module, fusing high-level visible and low-level thermal features for enhanced semantic and spatial accuracy; ii) the Dynamic Alignment and Scale Refinement module, correcting cross-modal spatial misalignments and preserving multi-scale features; and iii) an optimized label assignment strategy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.