Task Specific Attention is one more thing you need for object detection
Sang Yon Lee

TL;DR
This paper introduces a novel attention-based model called Task Specific Split Transformer (TSST) that achieves state-of-the-art object detection performance on COCO without relying on traditional hand-designed components like anchors and NMS.
Contribution
The paper proposes TSST, a new attention module that splits general-purpose attention into goal-specific parts, enabling simpler and more effective end-to-end object detection models.
Findings
TSST achieves state-of-the-art results on COCO.
The approach eliminates the need for anchors and NMS.
Extensive experiments validate the effectiveness of the method.
Abstract
Various models have been proposed to perform object detection. However, most require many handdesigned components such as anchors and non-maximum-suppression(NMS) to demonstrate good performance. To mitigate these issues, Transformer-based DETR and its variant, Deformable DETR, were suggested. These have solved much of the complex issue in designing a head for object detection models; however, doubts about performance still exist when considering Transformer-based models as state-of-the-art methods in object detection for other models depending on anchors and NMS revealed better results. Furthermore, it has been unclear whether it would be possible to build an end-to-end pipeline in combination only with attention modules, because the DETR-adapted Transformer method used a convolutional neural network (CNN) for the backbone body. In this study, we propose that combining several…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · Domain Adaptation and Few-Shot Learning
MethodsAttention Is All You Need · Linear Layer · Label Smoothing · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Adam · Dropout · Absolute Position Encodings · Convolution · Softmax
