Pyramid Transformer for Traffic Sign Detection
Omid Nejati Manzari, Amin Boudesh, Shahriar B. Shokouhi

TL;DR
This paper introduces a Pyramid Transformer model with locality mechanisms for traffic sign detection, leveraging multi-scale context and scale invariance to improve detection accuracy on small, unbalanced datasets.
Contribution
The novel Pyramid Transformer architecture incorporates spatial pyramid reduction layers and locality mechanisms, enhancing multi-scale feature learning for traffic sign detection.
Findings
Achieved 77.8% mAP on GTSDB with Cascade RCNN backbone.
Outperformed existing state-of-the-art models in traffic sign detection.
Demonstrated robustness to size discrepancies of traffic signs.
Abstract
Traffic sign detection is a vital task in the visual system of self-driving cars and the automated driving system. Recently, novel Transformer-based models have achieved encouraging results for various computer vision tasks. We still observed that vanilla ViT could not yield satisfactory results in traffic sign detection because the overall size of the datasets is very small and the class distribution of traffic signs is extremely unbalanced. To overcome this problem, a novel Pyramid Transformer with locality mechanisms is proposed in this paper. Specifically, Pyramid Transformer has several spatial pyramid reduction layers to shrink and embed the input image into tokens with rich multi-scale context by using atrous convolutions. Moreover, it inherits an intrinsic scale invariance inductive bias and is able to learn local feature representation for objects at various scales, thereby…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Advanced Neural Network Applications · Vehicle License Plate Recognition
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Dense Connections · Absolute Position Encodings · Dropout · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Adam
