You Only Submit One Image to Find the Most Suitable Generative Model

Zhi Zhou; Lan-Zhe Guo; Peng-Xiao Song; Yu-Feng Li

arXiv:2412.12232·cs.CV·December 18, 2024

You Only Submit One Image to Find the Most Suitable Generative Model

Zhi Zhou, Lan-Zhe Guo, Peng-Xiao Song, Yu-Feng Li

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a novel method for efficiently identifying the most suitable generative model for a user's needs using only one example image, significantly improving model selection accuracy.

Contribution

It proposes a comprehensive solution with three modules for generative model identification, addressing the challenge of matching user requirements with large model repositories.

Findings

01

Achieves over 80% top-4 identification accuracy with a single image input

02

Demonstrates efficiency and effectiveness of the proposed approach

03

Addresses cross-modality and dimensionality challenges in model identification

Abstract

Deep generative models have achieved promising results in image generation, and various generative model hubs, e.g., Hugging Face and Civitai, have been developed that enable model developers to upload models and users to download models. However, these model hubs lack advanced model management and identification mechanisms, resulting in users only searching for models through text matching, download sorting, etc., making it difficult to efficiently find the model that best meets user requirements. In this paper, we propose a novel setting called Generative Model Identification (GMI), which aims to enable the user to identify the most appropriate generative model(s) for the user's requirements from a large number of candidate models efficiently. To our best knowledge, it has not been studied yet. In this paper, we introduce a comprehensive solution consisting of three pivotal modules: a…

Peer Reviews

Decision·Submitted to ICLR 2024

Reviewer 01Rating 6· marginally above the acceptance thresholdConfidence 2

Strengths

1. The problem setting appears novel and useful. Especially, when there are hundreds of models in a model zoo and one needs to find what model could be used to produce a given image. 2. The method is straightforward, the exposition is very clear, and easy to read. 3. Experiments show promising results against existing retrieval baselines.

Weaknesses

1. As I understand, RKME scoring would work better if you have image distributions to match; that is, if you have more than one user provided query image. In that sense, it is not clear to me how the paper could claim that a single image is sufficient to identify the model? How do you substantiate this claim? One thought is: to derive a more general scoring function (in Eq. 2, for example) with multiple user query images and the authors did an ablation that shows that using a single image or whe

Reviewer 02Rating 3· reject, not good enoughConfidence 3

Strengths

- The problem setting addresses a practical need to identify the most suitable generative model amidst a myriad of options we have nowadays. - The application of RKME as a similarity metric is interesting. There's potential relevance to the "Informative Features for Model Comparison" work, even though the latter primarily focuses on comparing just two models based on the goodness of fit with image queries. - Separating precomputation and actual comparison is not new, but is a useful concept to s

Weaknesses

- The paper assumes that users will always provide an example image, which might not be universally applicable or intuitive. - The work heavily relies on existing methods: RKME, a pre-trained vision-language model (possibly CLIP?), and the image interrogator from a GitHub repository. There's a lack of novelty in the proposed method. - Perhaps it's just me, but I found the writing in the technical sections really confusing. - While the image interrogator is referenced from an existing work, it

Reviewer 03Rating 3· reject, not good enoughConfidence 4

Strengths

1. The application is interesting. There are many models online and how to identify the needed model efficiently is very important. 2. The proposed method is simple and effective. 3. The paper is overall well-written.

Weaknesses

1. The paper is more like a technical report instead of an academic paper. 2. The technical contributions are limited. It is quite trivial to calculate the distance between the uploaded image/prompt and the existing images/prompts. The MMD distance is a very naive distance metric, and its robustness is questionable.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScientific Computing and Data Management · Semantic Web and Ontologies