DivShift: Exploring Domain-Specific Distribution Shifts in Large-Scale, Volunteer-Collected Biodiversity Datasets
Elena Sierra, Lauren E. Gillespie, Salim Soltani, Moises, Exposito-Alonso, Teja Kattenborn

TL;DR
This paper introduces DivShift, a framework and dataset to analyze how biases in volunteer-collected biodiversity data affect machine learning models for species recognition, revealing that biases influence performance less than label shifts but still require caution.
Contribution
The paper presents DivShift, a novel framework and curated dataset for quantifying domain-specific biases in biodiversity data, and analyzes their impact on species recognition models.
Findings
Biases confound model performance less than label distribution shifts.
More data improves performance, but bias-specific effects vary.
Biases in volunteer data can affect biodiversity monitoring accuracy.
Abstract
Large-scale, volunteer-collected datasets of community-identified natural world imagery like iNaturalist have enabled marked performance gains for fine-grained visual classification of species using machine learning methods. However, such data -- sometimes referred to as citizen science data -- are opportunistic and lack a structured sampling strategy. This volunteer-collected biodiversity data contains geographic, temporal, taxonomic, observers, and sociopolitical biases that can have significant effects on biodiversity model performance, but whose impacts are unclear for fine-grained species recognition performance. Here we introduce Diversity Shift (DivShift), a framework for quantifying the effects of domain-specific distribution shifts on machine learning model performance. To diagnose the performance effects of biases specific to volunteer-collected biodiversity data, we also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSpecies Distribution and Climate Change · Environmental DNA in Biodiversity Studies · Genomics and Phylogenetic Studies
Methods7 Fastest Ways to Call American Airlines Reservations Number (USA Guide)
