Focus on Local Regions for Query-based Object Detection

Hongbin Xu; Yamei Xia; Shuai Zhao; Bo Cheng

arXiv:2310.06470·cs.CV·December 15, 2023·1 cites

Focus on Local Regions for Query-based Object Detection

Hongbin Xu, Yamei Xia, Shuai Zhao, Bo Cheng

PDF

Open Access

TL;DR

This paper introduces FoLR, a local-region focused transformer architecture for query-based object detection, improving convergence speed and efficiency by isolating relevant connections and employing adaptive sampling.

Contribution

FoLR's novel local-region self-attention and adaptive sampling significantly enhance query-based object detection performance and convergence speed.

Findings

01

FoLR achieves state-of-the-art results in query-based detection.

02

FoLR converges faster and is more computationally efficient.

03

FoLR outperforms previous methods in accuracy.

Abstract

Query-based methods have garnered significant attention in object detection since the advent of DETR, the pioneering query-based detector. However, these methods face challenges like slow convergence and suboptimal performance. Notably, self-attention in object detection often hampers convergence due to its global focus. To address these issues, we propose FoLR, a transformer-like architecture with only decoders. We improve the self-attention by isolating connections between irrelevant objects that makes it focus on local regions but not global regions. We also design the adaptive sampling method to extract effective features based on queries' local regions from feature maps. Additionally, we employ a look-back strategy for decoders to retain previous information, followed by the Feature Mixer module to fuse features and queries. Experimental results demonstrate FoLR's state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Focus