TL;DR
UHR-DETR is a novel transformer-based detector that efficiently identifies small objects in ultra-high-resolution remote sensing images by dynamically allocating resources and integrating scene and object details.
Contribution
The paper introduces a Coverage-Maximizing Sparse Encoder and a Global-Local Decoupled Decoder for effective small object detection in UHR imagery.
Findings
Achieves 2.8% higher mAP on UHR datasets.
Delivers 10x faster inference speed compared to sliding-window methods.
Operates effectively under limited hardware resources.
Abstract
Ultra-High-Resolution (UHR) imagery has become essential for modern remote sensing, offering unprecedented spatial coverage. However, detecting small objects in such vast scenes presents a critical dilemma: retaining the original resolution for small objects causes prohibitive memory bottlenecks. Conversely, conventional compromises like image downsampling or patch cropping either erase small objects or destroy context. To break this dilemma, we propose UHR-DETR, an efficient end-to-end transformer-based detector designed for UHR imagery. First, we introduce a Coverage-Maximizing Sparse Encoder that dynamically allocates finite computational resources to informative high-resolution regions, ensuring maximum object coverage with minimal spatial redundancy. Second, we design a Global-Local Decoupled Decoder. By integrating macroscopic scene awareness with microscopic object details, this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
