AdaMixer: A Fast-Converging Query-Based Object Detector

Ziteng Gao; Limin Wang; Bing Han; Sheng Guo

arXiv:2203.16507·cs.CV·April 1, 2022·5 cites

AdaMixer: A Fast-Converging Query-Based Object Detector

Ziteng Gao, Limin Wang, Bing Han, Sheng Guo

PDF

Open Access 2 Repos

TL;DR

AdaMixer is a novel query-based object detector that improves convergence speed and performance by adaptively sampling features and dynamically decoding them with an adaptive MLP-Mixer, eliminating the need for complex extra networks.

Contribution

It introduces a simple, fast-converging query-based detection architecture with adaptive feature sampling and decoding, enhancing efficiency and accuracy over prior methods.

Findings

01

Achieves 45.0 AP with 12 training epochs on MS COCO

02

Reaches 49.5 AP with longer training and advanced backbones

03

Demonstrates fast convergence and high accuracy without dense attention modules

Abstract

Traditional object detectors employ the dense paradigm of scanning over locations and scales in an image. The recent query-based object detectors break this convention by decoding image features with a set of learnable queries. However, this paradigm still suffers from slow convergence, limited performance, and design complexity of extra networks between backbone and decoder. In this paper, we find that the key to these issues is the adaptability of decoders for casting queries to varying objects. Accordingly, we propose a fast-converging query-based detector, named AdaMixer, by improving the adaptability of query-based decoding processes in two aspects. First, each query adaptively samples features over space and scales based on estimated offsets, which allows AdaMixer to efficiently attend to the coherent regions of objects. Then, we dynamically decode these sampled features with an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Label Smoothing · Multi-Head Attention · Adam · Residual Connection · Softmax · Absolute Position Encodings