AD-SAM: Fine-Tuning the Segment Anything Vision Foundation Model for Autonomous Driving Perception

Mario Camarena; Het Patel; Fatemeh Nazari; Evangelos Papalexakis; Mohamadhossein Noruzoliaee; Jia Chen

arXiv:2510.27047·cs.CV·November 3, 2025

AD-SAM: Fine-Tuning the Segment Anything Vision Foundation Model for Autonomous Driving Perception

Mario Camarena, Het Patel, Fatemeh Nazari, Evangelos Papalexakis, Mohamadhossein Noruzoliaee, Jia Chen

PDF

Open Access

TL;DR

This paper introduces AD-SAM, a fine-tuned vision model for autonomous driving perception that significantly improves semantic segmentation accuracy, efficiency, and generalization over existing models like SAM and DeepLabV3.

Contribution

The paper presents a novel dual-encoder architecture with deformable fusion and a hybrid loss for better segmentation in autonomous driving, outperforming prior models on key benchmarks.

Findings

01

Achieves 68.1 mIoU on Cityscapes, surpassing baselines.

02

Demonstrates strong cross-domain generalization with 0.87 retention score.

03

Converges within 30-40 epochs, doubling learning speed.

Abstract

This paper presents the Autonomous Driving Segment Anything Model (AD-SAM), a fine-tuned vision foundation model for semantic segmentation in autonomous driving (AD). AD-SAM extends the Segment Anything Model (SAM) with a dual-encoder and deformable decoder tailored to spatial and geometric complexity of road scenes. The dual-encoder produces multi-scale fused representations by combining global semantic context from SAM's pretrained Vision Transformer (ViT-H) with local spatial detail from a trainable convolutional deep learning backbone (i.e., ResNet-50). A deformable fusion module aligns heterogeneous features across scales and object geometries. The decoder performs progressive multi-stage refinement using deformable attention. Training is guided by a hybrid loss that integrates Focal, Dice, Lovasz-Softmax, and Surface losses, improving semantic class balance, boundary precision,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Advanced Neural Network Applications · Adversarial Robustness in Machine Learning