Pyramidal Adaptive Cross-Gating for Multimodal Detection
Zidong Gu, Shoufu Tian

TL;DR
This paper introduces PACGNet, a novel architecture for multimodal object detection in aerial imagery that enhances feature fusion by deep, hierarchical gating, leading to improved detection accuracy especially for small objects.
Contribution
The paper proposes the Pyramidal Adaptive Cross-Gating Network with two new modules, SCG and PFMG, for deep, hierarchical multimodal feature fusion, addressing noise and detail preservation issues.
Findings
Achieved state-of-the-art mAP50 scores of 82.2% on DroneVehicle dataset.
Demonstrated improved detection of small objects in aerial imagery.
Validated effectiveness through extensive experiments.
Abstract
Object detection in aerial imagery is a critical task in applications such as UAV reconnaissance. Although existing methods have extensively explored feature interaction between different modalities, they commonly rely on simple fusion strategies for feature aggregation. This introduces two critical flaws: it is prone to cross-modal noise and disrupts the hierarchical structure of the feature pyramid, thereby impairing the fine-grained detection of small objects. To address this challenge, we propose the Pyramidal Adaptive Cross-Gating Network (PACGNet), an architecture designed to perform deep fusion within the backbone. To this end, we design two core components: the Symmetrical Cross-Gating (SCG) module and the Pyramidal Feature-aware Multimodal Gating (PFMG) module. The SCG module employs a bidirectional, symmetrical "horizontal" gating mechanism to selectively absorb complementary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Infrared Target Detection Methodologies · Advanced Image and Video Retrieval Techniques
