Finding Inter-species Associations on Large Citizen Science Datasets
Jacob Deutsch

TL;DR
This paper presents a scalable method for identifying species associations from large citizen science datasets by analyzing spatial overlaps and comparing them to a null model, effectively filtering out spurious correlations.
Contribution
The paper introduces a novel, efficient approach to detect likely inter-species associations in large-scale citizen science data, accounting for observer bias and spatial heterogeneity.
Findings
Successfully identified known insect host-plant relationships.
Detected associations between Yerba Santa Beetles and California Yerba Santa.
Efficiently analyzed approximately 10^8 species pairs on modest hardware.
Abstract
Determining associations among different species from citizen science databases is challenging due to observer behavior and intrinsic density variations that give rise to correlations that do not imply species associations. This paper introduces a method that can efficiently analyze large datasets to extract likely species associations. It tiles space into small blocks chosen to be of the accuracy of the data coordinates, and reduces observations to presence/absence per tile, in order to compute pairwise overlaps. It compares these overlaps with a spatial Poisson process that serves as a null model. For each species , an expected overlap is estimated by averaging normalized overlaps over other species in the same vicinity. This gives a -score for significance of a species-species association and a correlation index for the strength of this association. This was tested on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpecies Distribution and Climate Change · Genetics, Bioinformatics, and Biomedical Research · Bioinformatics and Genomic Networks
