FQDet: Fast-converging Query-based Detector
C\'edric Picron, Punarjay Chakravarty, Tinne Tuytelaars

TL;DR
FQDet introduces a fast-converging query-based detector that improves accuracy and speed by enhancing cross-attention priors, outperforming existing two-stage detectors with less training time.
Contribution
The paper proposes an improved query-based two-stage detector, FQDet, with better cross-attention priors that eliminate the need for auxiliary losses and iterative bounding box mechanisms.
Findings
Achieves 45.4 AP on COCO val after 12 epochs with ResNet-50+TPN.
Outperforms Cascade R-CNN with the same backbone and less training.
Reaches 52.9 AP on COCO test-dev with larger backbone and multi-scale testing.
Abstract
Recently, two-stage Deformable DETR introduced the query-based two-stage head, a new type of two-stage head different from the region-based two-stage heads of classical detectors as Faster R-CNN. In query-based two-stage heads, the second stage selects one feature per detection processed by a transformer, called the query, as opposed to pooling a rectangular grid of features processed by CNNs as in region-based detectors. In this work, we improve the query-based head by improving the prior of the cross-attention operation with anchors, significantly speeding up the convergence while increasing its performance. Additionally, we empirically show that by improving the cross-attention prior, auxiliary losses and iterative bounding box mechanisms typically used by DETR-based detectors are no longer needed. By combining the best of both the classical and the DETR-based detectors, our FQDet…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Residual Connection · Label Smoothing · Byte Pair Encoding · Adam · Dense Connections · Absolute Position Encodings · Dropout
