Selecting Fine-Tuning Examples by Quizzing VLMs

Tenghao Ji; Eytan Adar

arXiv:2511.12002·cs.LG·November 18, 2025

Selecting Fine-Tuning Examples by Quizzing VLMs

Tenghao Ji, Eytan Adar

PDF

Open Access

TL;DR

This paper introduces QZLoRA, a method that uses automated visual reasoning to select high-quality images for fine-tuning text-to-image models, resulting in more accurate and representative generated images with fewer samples.

Contribution

The paper proposes QZLoRA, a novel framework that leverages QuizRank to automatically select images for low-rank adaptation, improving fine-tuning efficiency and output quality.

Findings

01

QZLoRA produces better aligned, photorealistic images with fewer samples.

02

Fine-tuned models generate more representative stylized images.

03

Automated image ranking enhances topic-specific generative modeling.

Abstract

A challenge in fine-tuning text-to-image diffusion models for specific topics is to select good examples. Fine-tuning from image sets of varying quality, such as Wikipedia Commons, will often produce poor output. However, training images that \textit{do} exemplify the target concept (e.g., a \textit{female Mountain Bluebird}) help ensure that the generated images are similarly representative (e.g., have the prototypical blue-wings and gray chest). In this work, we propose QZLoRA, a framework to select images for low-rank adaptation (LoRA). The approach leverages QuizRank, a method to automatically rank images by treating them as an `educational intervention' and `quizzing' a VLM. We demonstrate that QZLoRA can produce better aligned, photorealistic images with fewer samples. We also show that these fine-tuned models can produce stylized that are similarly representative (i.e.,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications