Regularized Training with Generated Datasets for Name-Only Transfer of Vision-Language Models
Minho Park, Sunghyun Park, Jooyeol Yun, Jaegul Choo

TL;DR
This paper introduces regularization techniques to improve fine-tuning of vision-language models on generated datasets, effectively addressing domain gaps and enhancing performance in name-only transfer scenarios.
Contribution
It proposes novel regularization methods for training and post-training to mitigate domain gaps in generated datasets for vision-language models.
Findings
Regularization improves model performance on real data.
Feature diversity correlates with better transfer results.
Methods achieve state-of-the-art performance on multiple datasets.
Abstract
Recent advancements in text-to-image generation have inspired researchers to generate datasets tailored for perception models using generative models, which prove particularly valuable in scenarios where real-world data is limited. In this study, our goal is to address the challenges when fine-tuning vision-language models (e.g., CLIP) on generated datasets. Specifically, we aim to fine-tune vision-language models to a specific classification model without access to any real images, also known as name-only transfer. However, despite the high fidelity of generated images, we observed a significant performance degradation when fine-tuning the model using the generated datasets due to the domain gap between real and generated images. To overcome the domain gap, we provide two regularization methods for training and post-training, respectively. First, we leverage the domain-agnostic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications
