# Improving the accuracy of automated labeling of specimen images datasets via a confidence-based process

**Authors:** Quentin Bateux, Jonathan Koss, Patrick W. Sweeney, Erika Edwards, Nelson Rios, Aaron M. Dollar, Jinshan Xu, Jinshan Xu, Jinshan Xu, Jinshan Xu

PMC · DOI: 10.1371/journal.pcbi.1013650 · 2025-11-12

## TL;DR

This paper introduces a method to boost the accuracy of automated labeling of specimen images by filtering out low-confidence predictions, enabling more reliable use in scientific research.

## Contribution

A confidence-based filtering approach is introduced to significantly improve the accuracy of automated specimen labeling by rejecting uncertain predictions.

## Key findings

- A naive model with 86% accuracy can reach over 95% accuracy by rejecting 40% of labels.
- The method allows researchers to adjust accuracy-coverage trade-offs based on their specific needs.
- The approach was successfully applied to a dataset of over 600,000 herbarium specimens to label reproductive states.

## Abstract

The digitization of natural history collections over the past three decades has unlocked a treasure trove of specimen imagery and metadata. There is great interest in making this data more useful by further labeling it with additional trait data, and modern “deep learning” machine learning techniques utilizing convolutional neural nets (CNNs) and similar networks show particular promise to reduce the amount of required manual labeling by human experts, making the process much faster and less expensive. However, in most cases, the accuracy of these approaches is too low for reliable utilization of the automatic labeling, typically in the range of 80-85% accuracy. In this paper, we present and validate an approach that can greatly improve this accuracy, essentially by examining the “confidence” that the network has in the generated label as well as utilizing a user-defined threshold to reject labels that fall below a chosen level. We demonstrate that a naive model that produced 86% initial accuracy can achieve improved performance - over 95% accuracy (rejecting about 40% of the labels) or over 99% accuracy (rejecting about 65%) by selecting higher confidence thresholds. This gives flexibility to adapt existing models to the statistical requirements of various types of research and has the potential to move these automatic labeling approaches from being unusably inaccurate to being an invaluable new tool. After validating the approach in a number of ways, we annotate the reproductive state of a large dataset of over 600,000 herbarium specimens. The analysis of the results points at under-investigated correlations as well as general alignment with known trends. By sharing this new dataset alongside this work, we want to allow biologists to gather insights for their own research questions, at their chosen point of accuracy/coverage trade-off.

In recent years, museums and research institutions have digitized vast collections of plant and animal specimens, creating an enormous amount of images and data. Scientists are eager to enhance this data by adding more detailed traits, but manually labeling each specimen is time-consuming and expensive. Deep learning methods, such as convolutional neural networks (CNNs), offer a way to speed up this process by automating labeling. However, these methods are often not accurate enough for scientific use, typically achieving only 80-85% accuracy.

In this study, we introduce a method to significantly improve the accuracy of automatic labeling. By assessing the “confidence” of the model’s predictions and setting a threshold to filter out uncertain labels, we show that accuracy can be increased dramatically. A basic model with 86% accuracy, for example, can exceed 95% accuracy (by rejecting 40% of labels) or even 99% accuracy (by rejecting 65%). This allows researchers to adjust the trade-off between accuracy and data coverage based on their needs.

To demonstrate the method’s usefulness, we applied it to a dataset of over 600,000 herbarium specimens, specifically labeling their reproductive state. This new dataset, shared alongside our study, provides biologists with a valuable resource to explore patterns in plant reproduction with confidence.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

26 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12629457/full.md

---
Source: https://tomesphere.com/paper/PMC12629457