Salience DETR: Enhancing Detection Transformer with Hierarchical   Salience Filtering Refinement

Xiuquan Hou; Meiqin Liu; Senlin Zhang; Ping Wei; Badong Chen

arXiv:2403.16131·cs.CV·March 26, 2024·3 cites

Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement

Xiuquan Hou, Meiqin Liu, Senlin Zhang, Ping Wei, Badong Chen

PDF

Open Access 3 Repos

TL;DR

Salience DETR introduces hierarchical salience filtering and query refinement to improve detection accuracy and efficiency in transformer-based object detection, reducing computational load while enhancing performance.

Contribution

The paper proposes a novel hierarchical salience filtering refinement method for DETR, addressing scale bias and semantic misalignment issues in two-stage detection frameworks.

Findings

01

Achieves +4.0% AP improvement on three detection datasets

02

Attains 49.2% AP on COCO 2017 with fewer FLOPs

03

Demonstrates better trade-off between efficiency and accuracy

Abstract

DETR-like methods have significantly increased detection performance in an end-to-end manner. The mainstream two-stage frameworks of them perform dense self-attention and select a fraction of queries for sparse cross-attention, which is proven effective for improving performance but also introduces a heavy computational burden and high dependence on stable query selection. This paper demonstrates that suboptimal two-stage selection strategies result in scale bias and redundancy due to the mismatch between selected queries and objects in two-stage initialization. To address these issues, we propose hierarchical salience filtering refinement, which performs transformer encoding only on filtered discriminative queries, for a better trade-off between computational efficiency and precision. The filtering process overcomes scale bias through a novel scale-independent salience supervision. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Infrared Target Detection Methodologies

MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Byte Pair Encoding · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Multi-Head Attention · Softmax · Feedforward Network