Pyramid Transformer for Traffic Sign Detection

Omid Nejati Manzari; Amin Boudesh; Shahriar B. Shokouhi

arXiv:2207.06067·cs.CV·July 25, 2022·1 cites

Pyramid Transformer for Traffic Sign Detection

Omid Nejati Manzari, Amin Boudesh, Shahriar B. Shokouhi

PDF

Open Access

TL;DR

This paper introduces a Pyramid Transformer model with locality mechanisms for traffic sign detection, leveraging multi-scale context and scale invariance to improve detection accuracy on small, unbalanced datasets.

Contribution

The novel Pyramid Transformer architecture incorporates spatial pyramid reduction layers and locality mechanisms, enhancing multi-scale feature learning for traffic sign detection.

Findings

01

Achieved 77.8% mAP on GTSDB with Cascade RCNN backbone.

02

Outperformed existing state-of-the-art models in traffic sign detection.

03

Demonstrated robustness to size discrepancies of traffic signs.

Abstract

Traffic sign detection is a vital task in the visual system of self-driving cars and the automated driving system. Recently, novel Transformer-based models have achieved encouraging results for various computer vision tasks. We still observed that vanilla ViT could not yield satisfactory results in traffic sign detection because the overall size of the datasets is very small and the class distribution of traffic signs is extremely unbalanced. To overcome this problem, a novel Pyramid Transformer with locality mechanisms is proposed in this paper. Specifically, Pyramid Transformer has several spatial pyramid reduction layers to shrink and embed the input image into tokens with rich multi-scale context by using atrous convolutions. Moreover, it inherits an intrinsic scale invariance inductive bias and is able to learn local feature representation for objects at various scales, thereby…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Advanced Neural Network Applications · Vehicle License Plate Recognition

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Dense Connections · Absolute Position Encodings · Dropout · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Adam