Exploring Camera Encoder Designs for Autonomous Driving Perception
Barath Lakshmanan, Joshua Chen, Shiyi Lan, Maying Shen, Zhiding Yu,, Jose M. Alvarez

TL;DR
This paper systematically analyzes and customizes camera encoder architectures for autonomous driving perception, significantly improving accuracy over standard models by tailoring design parameters to AV-specific data.
Contribution
It introduces an optimized AV-specific camera encoder architecture through systematic modifications of ConvNeXt, enhancing perception accuracy for autonomous vehicles.
Findings
Achieved 8.79% mAP improvement over baseline
Systematic analysis of encoder design parameters
Customized architecture outperforms general-purpose models
Abstract
The cornerstone of autonomous vehicles (AV) is a solid perception system, where camera encoders play a crucial role. Existing works usually leverage pre-trained Convolutional Neural Networks (CNN) or Vision Transformers (ViTs) designed for general vision tasks, such as image classification, segmentation, and 2D detection. Although those well-known architectures have achieved state-of-the-art accuracy in AV-related tasks, e.g., 3D Object Detection, there remains significant potential for improvement in network design due to the nuanced complexities of industrial-level AV dataset. Moreover, existing public AV benchmarks usually contain insufficient data, which might lead to inaccurate evaluation of those architectures.To reveal the AV-specific model insights, we start from a standard general-purpose encoder, ConvNeXt and progressively transform the design. We adjust different design…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Visual Attention and Saliency Detection · Video Surveillance and Tracking Methods
MethodsSoftmax · Attention Is All You Need · ConvNeXt
