FungalZSL: Zero-Shot Fungal Classification with Image Captioning Using a Synthetic Data Approach
Anju Rani, Daniel O. Arroyo, Petar Durdevic

TL;DR
This paper enhances zero-shot fungal classification by creating synthetic datasets using large language models and image generation, improving CLIP's ability to classify fungi at different growth stages.
Contribution
It introduces novel synthetic text and image datasets for fungi, generated via LLMs and image synthesis, to improve zero-shot classification in vision-language models.
Findings
Synthetic datasets improve classification accuracy.
Knowledge transfer between LLMs refines growth stage classification.
Alignment in shared representation space enhances model performance.
Abstract
The effectiveness of zero-shot classification in large vision-language models (VLMs), such as Contrastive Language-Image Pre-training (CLIP), depends on access to extensive, well-aligned text-image datasets. In this work, we introduce two complementary data sources, one generated by large language models (LLMs) to describe the stages of fungal growth and another comprising a diverse set of synthetic fungi images. These datasets are designed to enhance CLIPs zero-shot classification capabilities for fungi-related tasks. To ensure effective alignment between text and image data, we project them into CLIPs shared representation space, focusing on different fungal growth stages. We generate text using LLaMA3.2 to bridge modality gaps and synthetically create fungi images. Furthermore, we investigate knowledge transfer by comparing text outputs from different LLM techniques to refine…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCell Image Analysis Techniques · Digital Imaging for Blood Diseases · Image Processing Techniques and Applications
