A multi-modal dataset for insect biodiversity with imagery and DNA at the trap and individual level
Johanna Orsholm, John Quinto, Hannu Autto, Gaia Banelyte, Nicolas Chazot, Jeremy deWaard, Stephanie deWaard, Arielle Farrell, Brendan Furneaux, Bess Hardwick, Nao Ito, Amlan Kar, Oula Kalttop\"a\"a, Deirdre Kerdraon, Erik Kristensen, Jaclyn McKeown, Tommi Mononen, Ellen Nein

TL;DR
This paper introduces a novel multi-modal dataset combining imagery and DNA data for insect samples, enabling improved automatic classification and segmentation of bulk insect communities, which advances ecological research and machine learning methods.
Contribution
The creation of the MassID45 dataset, integrating molecular and imaging data at both sample and individual levels, is a new resource for insect biodiversity analysis.
Findings
Dataset enables training of classifiers for bulk insect samples.
Supports development of advanced segmentation and identification algorithms.
Facilitates large-scale ecological and machine learning research.
Abstract
Insects comprise millions of species, many experiencing severe population declines under environmental and habitat changes. High-throughput approaches are crucial for accelerating our understanding of insect diversity, with DNA barcoding and high-resolution imaging showing strong potential for automatic taxonomic classification. However, most image-based approaches rely on individual specimen data, unlike the unsorted bulk samples collected in large-scale ecological surveys. We present the Mixed Arthropod Sample Segmentation and Identification (MassID45) dataset for training automatic classifiers of bulk insect samples. It uniquely combines molecular and imaging data at both the unsorted sample level and the full set of individual specimens. Human annotators, supported by an AI-assisted tool, performed two tasks on bulk images: creating segmentation masks around each individual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEnvironmental DNA in Biodiversity Studies · Species Distribution and Climate Change · Cell Image Analysis Techniques
