Image-based Automated Species Identification: Can Virtual Data Augmentation Overcome Problems of Insufficient Sampling?
Morris Klasen, Dirk Ahrens, Jonas Eberle, and Volker Steinhage

TL;DR
This study explores a two-level data augmentation method combining image generation and feature space oversampling to improve automated species identification with limited training data.
Contribution
It introduces a novel two-level augmentation approach that enhances machine learning performance in species identification from scarce data samples.
Findings
Augmentation outperforms non-augmented deep learning baseline.
Augmentation surpasses traditional 2D morphometric methods.
Effective in challenging datasets of scarab beetles.
Abstract
Automated species identification and delimitation is challenging, particularly in rare and thus often scarcely sampled species, which do not allow sufficient discrimination of infraspecific versus interspecific variation. Typical problems arising from either low or exaggerated interspecific morphological differentiation are best met by automated methods of machine learning that learn efficient and effective species identification from training samples. However, limited infraspecific sampling remains a key challenge also in machine learning. 1In this study, we assessed whether a two-level data augmentation approach may help to overcome the problem of scarce training data in automated visual species identification. The first level of visual data augmentation applies classic approaches of data augmentation and generation of faked images using a GAN approach. Descriptive feature vectors are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsPrincipal Components Analysis · Global Average Pooling · Average Pooling
