CvT-ASSD: Convolutional vision-Transformer Based Attentive Single Shot   MultiBox Detector

Weiqiang Jin; Hang Yu; Hang Yu

arXiv:2110.12364·cs.CV·October 26, 2021·1 cites

CvT-ASSD: Convolutional vision-Transformer Based Attentive Single Shot MultiBox Detector

Weiqiang Jin, Hang Yu, Hang Yu

PDF

Open Access 1 Repo

TL;DR

This paper introduces CvT-ASSD, a novel object detection model combining convolutional vision transformers with an efficient single shot detector, achieving good accuracy and efficiency on large-scale datasets.

Contribution

The paper proposes CvT-ASSD, integrating convolutional vision transformers with an attentive single shot detector to improve detection accuracy and computational efficiency.

Findings

01

Achieves competitive detection performance on PASCAL VOC and MS COCO datasets.

02

Reduces computational complexity compared to traditional transformer-based detectors.

03

Demonstrates effective balance between accuracy and efficiency.

Abstract

Due to the success of Bidirectional Encoder Representations from Transformers (BERT) in natural language process (NLP), the multi-head attention transformer has been more and more prevalent in computer-vision researches (CV). However, it still remains a challenge for researchers to put forward complex tasks such as vision detection and semantic segmentation. Although multiple Transformer-Based architectures like DETR and ViT-FRCNN have been proposed to complete object detection task, they inevitably decreases discrimination accuracy and brings down computational efficiency caused by the enormous learning parameters and heavy computational complexity incurred by the traditional self-attention operation. In order to alleviate these issues, we present a novel object detection architecture, named Convolutional vision Transformer Based Attentive Single Shot MultiBox Detector (CvT-ASSD), that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

albert-jin/cvt-assd
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Convolution · Label Smoothing · Adam · Dropout