AD-SAM: Fine-Tuning the Segment Anything Vision Foundation Model for Autonomous Driving Perception
Mario Camarena, Het Patel, Fatemeh Nazari, Evangelos Papalexakis, Mohamadhossein Noruzoliaee, Jia Chen

TL;DR
This paper introduces AD-SAM, a fine-tuned vision model for autonomous driving perception that significantly improves semantic segmentation accuracy, efficiency, and generalization over existing models like SAM and DeepLabV3.
Contribution
The paper presents a novel dual-encoder architecture with deformable fusion and a hybrid loss for better segmentation in autonomous driving, outperforming prior models on key benchmarks.
Findings
Achieves 68.1 mIoU on Cityscapes, surpassing baselines.
Demonstrates strong cross-domain generalization with 0.87 retention score.
Converges within 30-40 epochs, doubling learning speed.
Abstract
This paper presents the Autonomous Driving Segment Anything Model (AD-SAM), a fine-tuned vision foundation model for semantic segmentation in autonomous driving (AD). AD-SAM extends the Segment Anything Model (SAM) with a dual-encoder and deformable decoder tailored to spatial and geometric complexity of road scenes. The dual-encoder produces multi-scale fused representations by combining global semantic context from SAM's pretrained Vision Transformer (ViT-H) with local spatial detail from a trainable convolutional deep learning backbone (i.e., ResNet-50). A deformable fusion module aligns heterogeneous features across scales and object geometries. The decoder performs progressive multi-stage refinement using deformable attention. Training is guided by a hybrid loss that integrates Focal, Dice, Lovasz-Softmax, and Surface losses, improving semantic class balance, boundary precision,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAutonomous Vehicle Technology and Safety · Advanced Neural Network Applications · Adversarial Robustness in Machine Learning
