AdaMixer: A Fast-Converging Query-Based Object Detector
Ziteng Gao, Limin Wang, Bing Han, Sheng Guo

TL;DR
AdaMixer is a novel query-based object detector that improves convergence speed and performance by adaptively sampling features and dynamically decoding them with an adaptive MLP-Mixer, eliminating the need for complex extra networks.
Contribution
It introduces a simple, fast-converging query-based detection architecture with adaptive feature sampling and decoding, enhancing efficiency and accuracy over prior methods.
Findings
Achieves 45.0 AP with 12 training epochs on MS COCO
Reaches 49.5 AP with longer training and advanced backbones
Demonstrates fast convergence and high accuracy without dense attention modules
Abstract
Traditional object detectors employ the dense paradigm of scanning over locations and scales in an image. The recent query-based object detectors break this convention by decoding image features with a set of learnable queries. However, this paradigm still suffers from slow convergence, limited performance, and design complexity of extra networks between backbone and decoder. In this paper, we find that the key to these issues is the adaptability of decoders for casting queries to varying objects. Accordingly, we propose a fast-converging query-based detector, named AdaMixer, by improving the adaptability of query-based decoding processes in two aspects. First, each query adaptively samples features over space and scales based on estimated offsets, which allows AdaMixer to efficiently attend to the coherent regions of objects. Then, we dynamically decode these sampled features with an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Label Smoothing · Multi-Head Attention · Adam · Residual Connection · Softmax · Absolute Position Encodings
