Match & Choose: Model Selection Framework for Fine-tuning Text-to-Image Diffusion Models
Basile Lewandowski, Robert Birke, Lydia Y. Chen

TL;DR
This paper introduces M&C, a novel framework that efficiently predicts the best pretrained text-to-image diffusion model for a target dataset without exhaustive fine-tuning, using a matching graph and performance prediction.
Contribution
The paper presents the first model selection framework for T2I models, leveraging a matching graph and graph embeddings to predict optimal models for fine-tuning on new datasets.
Findings
M&C predicts the best model in 61.3% of cases.
M&C closely matches the performance of the best model in remaining cases.
Framework reduces the need for exhaustive fine-tuning.
Abstract
Text-to-image (T2I) models based on diffusion and transformer architectures advance rapidly. They are often pretrained on large corpora, and openly shared on a model platform, such as HuggingFace. Users can then build up AI applications, e.g., generating media contents, by adopting pretrained T2I models and fine-tuning them on the target dataset. While public pretrained T2I models facilitate the democratization of the models, users face a new challenge: which model can be best fine-tuned based on the target data domain? Model selection is well addressed in classification tasks, but little is known in (pretrained) T2I models and their performance indication on the target domain. In this paper, we propose the first model selection framework, M&C, which enables users to efficiently choose a pretrained T2I model from a model platform without exhaustively fine-tuning them all on the target…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
