ED-SAM: An Efficient Diffusion Sampling Approach to Domain   Generalization in Vision-Language Foundation Models

Thanh-Dat Truong; Xin Li; Bhiksha Raj; Jackson Cothren; Khoa Luu

arXiv:2406.01432·cs.CV·June 4, 2024

ED-SAM: An Efficient Diffusion Sampling Approach to Domain Generalization in Vision-Language Foundation Models

Thanh-Dat Truong, Xin Li, Bhiksha Raj, Jackson Cothren, Khoa Luu

PDF

Open Access

TL;DR

This paper introduces ED-SAM, a diffusion sampling method that enhances the domain generalization of vision-language models by generating adversarial samples, leading to state-of-the-art results across multiple datasets.

Contribution

The paper proposes a novel diffusion sampling approach with a transport transformation to improve domain generalization in vision-language models, supported by theoretical analysis.

Findings

01

Achieves state-of-the-art performance on CC3M, CC12M, and LAION400M datasets.

02

Effectively generates adversarial samples to enhance model robustness.

03

Demonstrates scalability across different dataset sizes.

Abstract

The Vision-Language Foundation Model has recently shown outstanding performance in various perception learning tasks. The outstanding performance of the vision-language model mainly relies on large-scale pre-training datasets and different data augmentation techniques. However, the domain generalization problem of the vision-language foundation model needs to be addressed. This problem has limited the generalizability of the vision-language foundation model to unknown data distributions. In this paper, we introduce a new simple but efficient Diffusion Sampling approach to Domain Generalization (ED-SAM) to improve the generalizability of the vision-language foundation model. Our theoretical analysis in this work reveals the critical role and relation of the diffusion model to domain generalization in the vision-language foundation model. Then, based on the insightful analysis, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning

MethodsDiffusion