A Survey of Vision Transformers in Autonomous Driving: Current Trends   and Future Directions

Quoc-Vinh Lai-Dang

arXiv:2403.07542·cs.CV·March 13, 2024·3 cites

A Survey of Vision Transformers in Autonomous Driving: Current Trends and Future Directions

Quoc-Vinh Lai-Dang

PDF

Open Access

TL;DR

This survey reviews the adoption of Vision Transformers in autonomous driving, highlighting their advantages over traditional models in scene understanding and outlining future research directions.

Contribution

It provides a comprehensive overview of Vision Transformer applications in autonomous driving, including architecture, advantages, limitations, and future prospects.

Findings

01

Transformers outperform CNNs in global scene understanding.

02

Vision Transformers are effective in object detection and segmentation.

03

Future research will enhance real-time autonomous driving capabilities.

Abstract

This survey explores the adaptation of visual transformer models in Autonomous Driving, a transition inspired by their success in Natural Language Processing. Surpassing traditional Recurrent Neural Networks in tasks like sequential image processing and outperforming Convolutional Neural Networks in global context capture, as evidenced in complex scene recognition, Transformers are gaining traction in computer vision. These capabilities are crucial in Autonomous Driving for real-time, dynamic visual scene processing. Our survey provides a comprehensive overview of Vision Transformer applications in Autonomous Driving, focusing on foundational concepts such as self-attention, multi-head attention, and encoder-decoder architecture. We cover applications in object detection, segmentation, pedestrian detection, lane detection, and more, comparing their architectural merits and limitations.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Currency Recognition and Detection

MethodsAttention Is All You Need · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Absolute Position Encodings · Dropout · Softmax · Residual Connection · Linear Layer · Dense Connections · Label Smoothing