Focus on Local Regions for Query-based Object Detection
Hongbin Xu, Yamei Xia, Shuai Zhao, Bo Cheng

TL;DR
This paper introduces FoLR, a local-region focused transformer architecture for query-based object detection, improving convergence speed and efficiency by isolating relevant connections and employing adaptive sampling.
Contribution
FoLR's novel local-region self-attention and adaptive sampling significantly enhance query-based object detection performance and convergence speed.
Findings
FoLR achieves state-of-the-art results in query-based detection.
FoLR converges faster and is more computationally efficient.
FoLR outperforms previous methods in accuracy.
Abstract
Query-based methods have garnered significant attention in object detection since the advent of DETR, the pioneering query-based detector. However, these methods face challenges like slow convergence and suboptimal performance. Notably, self-attention in object detection often hampers convergence due to its global focus. To address these issues, we propose FoLR, a transformer-like architecture with only decoders. We improve the self-attention by isolating connections between irrelevant objects that makes it focus on local regions but not global regions. We also design the adaptive sampling method to extract effective features based on queries' local regions from feature maps. Additionally, we employ a look-back strategy for decoders to retain previous information, followed by the Feature Mixer module to fuse features and queries. Experimental results demonstrate FoLR's state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Focus
