CNN-transformer mixed model for object detection

Wenshuo Li

arXiv:2212.06714·cs.CV·December 14, 2022·1 cites

CNN-transformer mixed model for object detection

Wenshuo Li

PDF

Open Access

TL;DR

This paper introduces a CNN-transformer hybrid model for object detection that fuses local and global features to improve accuracy while reducing computational costs, demonstrating significant performance gains on standard datasets.

Contribution

The paper proposes a novel convolutional module with a transformer that enhances feature extraction and detection accuracy, integrated into YOLOv5n, with promising experimental results.

Findings

01

mAP improved by 1.7% on COCO dataset

02

Achieved 81% accuracy on Pascal VOC with fewer parameters

03

Model outperforms Faster R-CNN with ResNet-101 in accuracy

Abstract

Object detection, one of the three main tasks of computer vision, has been used in various applications. The main process is to use deep neural networks to extract the features of an image and then use the features to identify the class and location of an object. Therefore, the main direction to improve the accuracy of object detection tasks is to improve the neural network to extract features better. In this paper, I propose a convolutional module with a transformer[1], which aims to improve the recognition accuracy of the model by fusing the detailed features extracted by CNN[2] with the global features extracted by a transformer and significantly reduce the computational effort of the transformer module by deflating the feature mAP. The main execution steps are convolutional downsampling to reduce the feature map size, then self-attention calculation and upsampling, and finally…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · COVID-19 diagnosis using AI