Towards Light Weight Object Detection System
Dharma KC, Venkata Ravi Kiran Dayana, Meng-Lin Wu, Venkateswara Rao, Cherukuri, Hau Hwang

TL;DR
This paper introduces a lightweight transformer-based approach for object detection that reduces latency, improves accuracy through multi-resolution feature fusion, and offers a generalized architecture for future design.
Contribution
It proposes an approximation of self-attention layers, a transformer encoder for feature fusion, and the gFormer abstraction to advance lightweight object detection.
Findings
Reduced latency in transformer-based detection systems
Enhanced accuracy with multi-resolution feature fusion
Provided a flexible architecture for designing new transformers
Abstract
Transformers are a popular choice for classification tasks and as backbones for object detection tasks. However, their high latency brings challenges in their adaptation to lightweight object detection systems. We present an approximation of the self-attention layers used in the transformer architecture. This approximation reduces the latency of the classification system while incurring minimal loss in accuracy. We also present a method that uses a transformer encoder layer for multi-resolution feature fusion. This feature fusion improves the accuracy of the state-of-the-art lightweight object detection system without significantly increasing the number of parameters. Finally, we provide an abstraction for the transformer architecture called Generalized Transformer (gFormer) that can guide the design of novel transformer-like architectures.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Infrared Target Detection Methodologies
MethodsAttention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Residual Connection · Dropout · Softmax · Label Smoothing · Multi-Head Attention · Adam · Dense Connections
