YOLO-SPCI: Enhancing Remote Sensing Object Detection via Selective-Perspective-Class Integration
Xinyuan Wang, Lian Peng, Xiangcheng Li, Yilin He, KinTak U

TL;DR
YOLO-SPCI introduces a novel attention-based framework with a specialized module to improve multi-scale feature refinement for remote sensing object detection, achieving superior results on aerial imagery datasets.
Contribution
The paper presents the SPCI module, combining selective gating, perspective fusion, and class discrimination, integrated into YOLOv8 to enhance feature representation for remote sensing images.
Findings
Outperforms state-of-the-art detectors on NWPU VHR-10 dataset
Improves multi-scale feature refinement in YOLOv8
Demonstrates effectiveness of SPCI modules in remote sensing detection
Abstract
Object detection in remote sensing imagery remains a challenging task due to extreme scale variation, dense object distributions, and cluttered backgrounds. While recent detectors such as YOLOv8 have shown promising results, their backbone architectures lack explicit mechanisms to guide multi-scale feature refinement, limiting performance on high-resolution aerial data. In this work, we propose YOLO-SPCI, an attention-enhanced detection framework that introduces a lightweight Selective-Perspective-Class Integration (SPCI) module to improve feature representation. The SPCI module integrates three components: a Selective Stream Gate (SSG) for adaptive regulation of global feature flow, a Perspective Fusion Module (PFM) for context-aware multi-scale integration, and a Class Discrimination Module (CDM) to enhance inter-class separability. We embed two SPCI blocks into the P3 and P5 stages…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Remote Sensing and Land Use · Remote-Sensing Image Classification
