Application of Computer Vision to the Automated Extraction of Metadata from Natural History Specimen Labels: A Case Study on Herbarium Specimens
Jacopo Zacchigna, Weiwei Liu, Felice Andrea Pellegrino, Adriano Peron, Francesco Roma-Marzio, Lorenzo Peruzzi, Stefano Martellos

TL;DR
This paper presents an automated system using computer vision to extract metadata from herbarium specimen labels, improving efficiency and accuracy over traditional OCR methods.
Contribution
A novel end-to-end solution using a fine-tuned multimodal Transformer for metadata extraction from herbarium labels without preprocessing or manual labeling.
Findings
The system achieved 85% accuracy using Tree Edit Distance on a test dataset from the University of Pisa.
Multiple labels with mixed handwriting and typewritten text posed the greatest challenge for the model.
The approach offers flexibility for reuse and adaptation as newer foundational models become available.
Abstract
Extracting metadata from natural history collection labels is pivotal for the online publication of digitized specimens. Building on a pre-trained multimodal Transformer, we developed an end-to-end automated solution to extract metadata from digitally imaged herbarium specimen labels and map them to Darwin Core standard concepts. A second objective was to demonstrate the feasibility of applying state-of-the-art AI techniques to biodiversity data through a real-world use case that does not require image preprocessing or additional manual labeling for training. The proposed solution does not rely on closed-source services, is fine-tuned in-house, and can be used offline and locally. It can be flexibly reused by developers to extract metadata across different herbarium collections. Furthermore, its encoder and/or decoder component can be replaced to take advantage of newer foundational…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18
Figure 19
Figure 20
Figure 21
Figure 22
Figure 23
Figure 24
Figure 25
Figure 26
Figure 27
Figure 28
Figure 29
Figure 30
Figure 31
Figure 32
Figure 33
Figure 34
Figure 35
Figure 36Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques
