CrypticBio: A Large Multimodal Dataset for Visually Confusing Biodiversity
Georgiana Manolache, Gerard Schouten, Joaquin Vanschoren

TL;DR
CrypticBio is the largest multimodal dataset of visually confusing species, designed to advance AI models in biodiversity research by including rich annotations and multimodal cues like geographical and temporal data.
Contribution
It introduces CrypticBio, a comprehensive dataset with multimodal annotations for cryptic species, and provides a benchmark demonstrating the importance of contextual cues in species identification.
Findings
Geographical context significantly improves zero-shot species identification.
State-of-the-art models show substantial performance gains with multimodal data.
CrypticBio enables research on visual and contextual cues for cryptic species recognition.
Abstract
We present CrypticBio, the largest publicly available multimodal dataset of visually confusing species, specifically curated to support the development of AI models in the context of biodiversity applications. Visually confusing or cryptic species are groups of two or more taxa that are nearly indistinguishable based on visual characteristics alone. While much existing work addresses taxonomic identification in a broad sense, datasets that directly address the morphological confusion of cryptic species are small, manually curated, and target only a single taxon. Thus, the challenge of identifying such subtle differences in a wide range of taxa remains unaddressed. Curated from real-world trends in species misidentification among community annotators of iNaturalist, CrypticBio contains 52K unique cryptic groups spanning 67K species, represented in 166 million images. Rich research-grade…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpecies Distribution and Climate Change · Identification and Quantification in Food · Environmental DNA in Biodiversity Studies
