A Human-in-the-Loop Framework for Efficient Prompt Selection in Microscopy Vision-Language Models
Abhiram Kandiyana, Ankur Mali, Lawrence O. Hall, Peter R. Mouton, Dmitry Goldgof

TL;DR
This paper introduces a human-in-the-loop framework for microscopy image classification that uses active learning to efficiently select images for expert verification, significantly reducing annotation effort.
Contribution
It formulates prompt-set construction as a target-driven active learning problem and demonstrates effective selection criteria under low-resource constraints.
Findings
Achieves 100% test accuracy with as few as 20 annotated images.
Reduces expert annotation effort compared to random selection.
Validates methods on microscopy image classification tasks.
Abstract
Deep-learning pipelines for microscopy image classification often require expensive, labor- and time-intensive expert annotation to produce high-quality ground truth for training. Recent work has shown that prompt tuning of vision-language models (VLMs) can reduce manual annotation by constructing a small prompt set of expert-verified image-caption exemplars that is reused as few-shot context to classify all remaining images at inference time. To further reduce effort, the VLM can draft captions for candidate exemplars, which experts then verify and lightly edit instead of writing text de novo. However, two practical questions remain unaddressed: (1) which unlabeled images should be prioritized for verification, and (2) how many verified exemplars are needed to reach a performance target. In this work, we address these questions by formulating prompt-set construction as a target-driven…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
