BioCLIP 2: Emergent Properties from Scaling Hierarchical Contrastive Learning

Jianyang Gu; Samuel Stevens; Elizabeth G Campolongo; Matthew J Thompson; Net Zhang; Jiaman Wu; Andrei Kopanev; Zheda Mai; Alexander E. White; James Balhoff; Wasila Dahdul; Daniel Rubenstein; Hilmar Lapp; Tanya Berger-Wolf; Wei-Lun Chao; Yu Su

arXiv:2505.23883·cs.CV·October 24, 2025

BioCLIP 2: Emergent Properties from Scaling Hierarchical Contrastive Learning

Jianyang Gu, Samuel Stevens, Elizabeth G Campolongo, Matthew J Thompson, Net Zhang, Jiaman Wu, Andrei Kopanev, Zheda Mai, Alexander E. White, James Balhoff, Wasila Dahdul, Daniel Rubenstein, Hilmar Lapp, Tanya Berger-Wolf, Wei-Lun Chao, Yu Su

PDF

Open Access 1 Repo 5 Models 1 Datasets

TL;DR

BioCLIP 2, trained on a large biological image dataset, exhibits emergent properties such as meaningful species embeddings and ecological correlations, demonstrating capabilities beyond its initial training objectives.

Contribution

This work introduces BioCLIP 2 trained on the largest biological image dataset, revealing emergent properties and hierarchical embedding structures in biological vision models.

Findings

01

Embedding space aligns with ecological and functional meanings.

02

Intra-species variations are preserved and well-separated.

03

Emergent properties increase with larger-scale training data.

Abstract

Foundation models trained at scale exhibit remarkable emergent behaviors, learning new capabilities beyond their initial training objectives. We find such emergent behaviors in biological vision models via large-scale contrastive vision-language training. To achieve this, we first curate TreeOfLife-200M, comprising 214 million images of living organisms, the largest and most diverse biological organism image dataset to date. We then train BioCLIP 2 on TreeOfLife-200M to distinguish different species. Despite the narrow training objective, BioCLIP 2 yields extraordinary accuracy when applied to various biological visual tasks such as habitat classification and trait prediction. We identify emergent properties in the learned embedding space of BioCLIP 2. At the inter-species level, the embedding distribution of different species aligns closely with functional and ecological meanings…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

imageomics/treeoflife-toolbox
jaxOfficial

Models

Datasets

imageomics/TreeOfLife-200M
dataset· 3.1k dl
3.1k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies