Exploring Camera Encoder Designs for Autonomous Driving Perception

Barath Lakshmanan; Joshua Chen; Shiyi Lan; Maying Shen; Zhiding Yu,; Jose M. Alvarez

arXiv:2407.07276·cs.CV·July 11, 2024

Exploring Camera Encoder Designs for Autonomous Driving Perception

Barath Lakshmanan, Joshua Chen, Shiyi Lan, Maying Shen, Zhiding Yu,, Jose M. Alvarez

PDF

Open Access

TL;DR

This paper systematically analyzes and customizes camera encoder architectures for autonomous driving perception, significantly improving accuracy over standard models by tailoring design parameters to AV-specific data.

Contribution

It introduces an optimized AV-specific camera encoder architecture through systematic modifications of ConvNeXt, enhancing perception accuracy for autonomous vehicles.

Findings

01

Achieved 8.79% mAP improvement over baseline

02

Systematic analysis of encoder design parameters

03

Customized architecture outperforms general-purpose models

Abstract

The cornerstone of autonomous vehicles (AV) is a solid perception system, where camera encoders play a crucial role. Existing works usually leverage pre-trained Convolutional Neural Networks (CNN) or Vision Transformers (ViTs) designed for general vision tasks, such as image classification, segmentation, and 2D detection. Although those well-known architectures have achieved state-of-the-art accuracy in AV-related tasks, e.g., 3D Object Detection, there remains significant potential for improvement in network design due to the nuanced complexities of industrial-level AV dataset. Moreover, existing public AV benchmarks usually contain insufficient data, which might lead to inaccurate evaluation of those architectures.To reveal the AV-specific model insights, we start from a standard general-purpose encoder, ConvNeXt and progressively transform the design. We adjust different design…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Visual Attention and Saliency Detection · Video Surveillance and Tracking Methods

MethodsSoftmax · Attention Is All You Need · ConvNeXt