TL;DR
FSDETR introduces a frequency-spatial feature enhancement framework that significantly improves small object detection by leveraging hierarchical attention, deformable intra-scale interactions, and frequency-spatial pyramid networks.
Contribution
The paper proposes a novel framework combining frequency filtering and spatial attention mechanisms to enhance small object detection performance.
Findings
Achieves 13.9% APS on VisDrone 2019 with 14.7M parameters.
Attains 48.95% AP50 tiny on TinyPerson dataset.
Outperforms existing methods on small-object benchmarks.
Abstract
Small object detection remains a significant challenge due to feature degradation from downsampling, mutual occlusion in dense clusters, and complex background interference. To address these issues, this paper proposes FSDETR, a frequency-spatial feature enhancement framework built upon the RT-DETR baseline. By establishing a collaborative modeling mechanism, the method effectively leverages complementary structural information. Specifically, a Spatial Hierarchical Attention Block (SHAB) captures both local details and global dependencies to strengthen semantic representation. Furthermore, to mitigate occlusion in dense scenes, the Deformable Attention-based Intra-scale Feature Interaction (DA-AIFI) focuses on informative regions via dynamic sampling. Finally, the Frequency-Spatial Feature Pyramid Network (FSFPN) integrates frequency filtering with spatial edge extraction via the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
