ED-SAM: An Efficient Diffusion Sampling Approach to Domain Generalization in Vision-Language Foundation Models
Thanh-Dat Truong, Xin Li, Bhiksha Raj, Jackson Cothren, Khoa Luu

TL;DR
This paper introduces ED-SAM, a diffusion sampling method that enhances the domain generalization of vision-language models by generating adversarial samples, leading to state-of-the-art results across multiple datasets.
Contribution
The paper proposes a novel diffusion sampling approach with a transport transformation to improve domain generalization in vision-language models, supported by theoretical analysis.
Findings
Achieves state-of-the-art performance on CC3M, CC12M, and LAION400M datasets.
Effectively generates adversarial samples to enhance model robustness.
Demonstrates scalability across different dataset sizes.
Abstract
The Vision-Language Foundation Model has recently shown outstanding performance in various perception learning tasks. The outstanding performance of the vision-language model mainly relies on large-scale pre-training datasets and different data augmentation techniques. However, the domain generalization problem of the vision-language foundation model needs to be addressed. This problem has limited the generalizability of the vision-language foundation model to unknown data distributions. In this paper, we introduce a new simple but efficient Diffusion Sampling approach to Domain Generalization (ED-SAM) to improve the generalizability of the vision-language foundation model. Our theoretical analysis in this work reveals the critical role and relation of the diffusion model to domain generalization in the vision-language foundation model. Then, based on the insightful analysis, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
MethodsDiffusion
