Exploring the Benefits of Vision Foundation Models for Unsupervised   Domain Adaptation

Brun\'o B. Englert; Fabrizio J. Piva; Tommie Kerssies; Daan de Geus,; Gijs Dubbelman

arXiv:2406.09896·cs.CV·June 18, 2024·1 cites

Exploring the Benefits of Vision Foundation Models for Unsupervised Domain Adaptation

Brun\'o B. Englert, Fabrizio J. Piva, Tommie Kerssies, Daan de Geus,, Gijs Dubbelman

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that combining Vision Foundation Models with Unsupervised Domain Adaptation significantly improves semantic segmentation performance and inference speed, establishing new benchmarks and efficiencies in the field.

Contribution

It introduces a method that integrates VFMs with UDA, achieving faster inference and better accuracy, setting new standards for domain adaptation in computer vision.

Findings

01

8.4× speedup over previous methods

02

+1.2 mIoU improvement in UDA performance

03

+6.1 mIoU in out-of-distribution generalization

Abstract

Achieving robust generalization across diverse data domains remains a significant challenge in computer vision. This challenge is important in safety-critical applications, where deep-neural-network-based systems must perform reliably under various environmental conditions not seen during training. Our study investigates whether the generalization capabilities of Vision Foundation Models (VFMs) and Unsupervised Domain Adaptation (UDA) methods for the semantic segmentation task are complementary. Results show that combining VFMs with UDA has two main benefits: (a) it allows for better UDA performance while maintaining the out-of-distribution performance of VFMs, and (b) it makes certain time-consuming UDA components redundant, thus enabling significant inference speedups. Specifically, with equivalent model sizes, the resulting VFM-UDA method achieves an 8.4 $\times$ speed increase over…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tue-mps/vfm-uda
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings