TaxaAdapter: Vision Taxonomy Models are Key to Fine-grained Image Generation over the Tree of Life

Mridul Khurana; Amin Karimi Monsefi; Justin Lee; Medha Sawhney; David Carlyn; Julia Chae; Jianyang Gu; Rajiv Ramnath; Sara Beery; Wei-Lun Chao; Anuj Karpatne; Cheng Zhang

arXiv:2603.26128·cs.CV·March 30, 2026

TaxaAdapter: Vision Taxonomy Models are Key to Fine-grained Image Generation over the Tree of Life

Mridul Khurana, Amin Karimi Monsefi, Justin Lee, Medha Sawhney, David Carlyn, Julia Chae, Jianyang Gu, Rajiv Ramnath, Sara Beery, Wei-Lun Chao, Anuj Karpatne, Cheng Zhang

PDF

1 Repo

TL;DR

TaxaAdapter leverages Vision Taxonomy Models to significantly enhance fine-grained, species-specific image generation in text-to-image models, demonstrating improved fidelity, generalization, and interpretability.

Contribution

It introduces TaxaAdapter, a lightweight method that incorporates VTMs into diffusion models, enabling accurate, scalable, and flexible species-level image synthesis.

Findings

01

Improves morphology fidelity and species-identity accuracy.

02

Enables few-shot and zero-shot species generation.

03

Introduces a trait-level evaluation metric for morphological consistency.

Abstract

Accurately generating images across the Tree of Life is difficult: there are over 10M distinct species on Earth, many of which differ only by subtle visual traits. Despite the remarkable progress in text-to-image synthesis, existing models often fail to capture the fine-grained visual cues that define species identity, even when their outputs appear photo-realistic. To this end, we propose TaxaAdapter, a simple and lightweight approach that incorporates Vision Taxonomy Models (VTMs) such as BioCLIP to guide fine-grained species generation. Our method injects VTM embeddings into a frozen text-to-image diffusion model, improving species-level fidelity while preserving flexible text control over attributes such as pose, style, and background. Extensive experiments demonstrate that TaxaAdapter consistently improves morphology fidelity and species-identity accuracy over strong baselines,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ufere/Assingment_1
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.