Improving the accuracy of automated labeling of specimen images datasets via a confidence-based process
Quentin Bateux, Jonathan Koss, Patrick W. Sweeney, Erika Edwards, Nelson Rios, Aaron M. Dollar, Jinshan Xu, Jinshan Xu, Jinshan Xu, Jinshan Xu

TL;DR
This paper introduces a method to boost the accuracy of automated labeling of specimen images by filtering out low-confidence predictions, enabling more reliable use in scientific research.
Contribution
A confidence-based filtering approach is introduced to significantly improve the accuracy of automated specimen labeling by rejecting uncertain predictions.
Findings
A naive model with 86% accuracy can reach over 95% accuracy by rejecting 40% of labels.
The method allows researchers to adjust accuracy-coverage trade-offs based on their specific needs.
The approach was successfully applied to a dataset of over 600,000 herbarium specimens to label reproductive states.
Abstract
The digitization of natural history collections over the past three decades has unlocked a treasure trove of specimen imagery and metadata. There is great interest in making this data more useful by further labeling it with additional trait data, and modern “deep learning” machine learning techniques utilizing convolutional neural nets (CNNs) and similar networks show particular promise to reduce the amount of required manual labeling by human experts, making the process much faster and less expensive. However, in most cases, the accuracy of these approaches is too low for reliable utilization of the automatic labeling, typically in the range of 80-85% accuracy. In this paper, we present and validate an approach that can greatly improve this accuracy, essentially by examining the “confidence” that the network has in the generated label as well as utilizing a user-defined threshold to…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18
Figure 19
Figure 20
Figure 21
Figure 22
Figure 23
Figure 24
Figure 25
Figure 26Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpecies Distribution and Climate Change · Cell Image Analysis Techniques · AI in cancer detection
