Training Strategies for Vision Transformers for Object Detection

Apoorv Singh

arXiv:2304.02186·cs.CV·April 6, 2023·1 cites

Training Strategies for Vision Transformers for Object Detection

Apoorv Singh

PDF

Open Access

TL;DR

This paper explores optimization strategies for vision transformer-based object detection in autonomous driving, achieving significant inference-time improvements with minimal performance loss, suitable for real-time deployment on edge devices.

Contribution

It introduces inference-time optimization strategies for vision transformers, balancing accuracy and speed, and demonstrates their effectiveness in real-world autonomous driving scenarios.

Findings

01

Inference-time improved by 63% with only 3% accuracy drop.

02

Transformers' inference time reduced below traditional CNN detectors.

03

Strategies validated with float32 and float16 precision using TensorRT.

Abstract

Vision-based Transformer have shown huge application in the perception module of autonomous driving in terms of predicting accurate 3D bounding boxes, owing to their strong capability in modeling long-range dependencies between the visual features. However Transformers, initially designed for language models, have mostly focused on the performance accuracy, and not so much on the inference-time budget. For a safety critical system like autonomous driving, real-time inference at the on-board compute is an absolute necessity. This keeps our object detection algorithm under a very tight run-time budget. In this paper, we evaluated a variety of strategies to optimize on the inference-time of vision transformers based object detection methods keeping a close-watch on any performance variations. Our chosen metric for these strategies is accuracy-runtime joint optimization. Moreover, for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Industrial Vision Systems and Defect Detection · Advanced Image and Video Retrieval Techniques

MethodsAttention Is All You Need · Dropout · Dense Connections · Convolution · Adam · Non Maximum Suppression · Layer Normalization · Softmax · Linear Layer · 1x1 Convolution