Improving the accuracy of automated labeling of specimen images datasets   via a confidence-based process

Quentin Bateux; Jonathan Koss; Patrick W. Sweeney; Erika Edwards,; Nelson Rios; Aaron M. Dollar

arXiv:2411.10074·cs.CV·November 25, 2024

Improving the accuracy of automated labeling of specimen images datasets via a confidence-based process

Quentin Bateux, Jonathan Koss, Patrick W. Sweeney, Erika Edwards,, Nelson Rios, Aaron M. Dollar

PDF

Open Access

TL;DR

This paper introduces a confidence-based method to significantly improve the accuracy of automated specimen image labeling using deep learning, enabling customizable trade-offs between accuracy and label rejection.

Contribution

It presents a novel confidence thresholding approach that enhances label accuracy from around 85% to over 95% or 99%, adaptable to research needs.

Findings

01

Achieved over 95% accuracy by rejecting 40% of low-confidence labels.

02

Reaching over 99% accuracy by rejecting 65% of labels.

03

Validated the approach on a dataset of 600,000 herbarium specimens.

Abstract

The digitization of natural history collections over the past three decades has unlocked a treasure trove of specimen imagery and metadata. There is great interest in making this data more useful by further labeling it with additional trait data, and modern deep learning machine learning techniques utilizing convolutional neural nets (CNNs) and similar networks show particular promise to reduce the amount of required manual labeling by human experts, making the process much faster and less expensive. However, in most cases, the accuracy of these approaches is too low for reliable utilization of the automatic labeling, typically in the range of 80-85% accuracy. In this paper, we present and validate an approach that can greatly improve this accuracy, essentially by examining the confidence that the network has in the generated label as well as utilizing a user-defined threshold to reject…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage and Object Detection Techniques · Image Processing and 3D Reconstruction