Fully automatic extraction of morphological traits from the Web: utopia or reality?
Diego Marcos, Robert van de Vlasakker, Ioannis N. Athanasiadis, Pierre, Bonnet, Herv\'e Goeau, Alexis Joly, W. Daniel Kissling, C\'esar Leblanc,, Andr\'e S.J. van Proosdij, Konstantinos P. Panousis

TL;DR
This paper explores the feasibility of automatically extracting plant morphological traits from unstructured online text using large language models, demonstrating promising results in creating structured trait databases at scale.
Contribution
It introduces a novel LLM-based method for extracting plant traits from web text without manual curation, enabling large-scale trait database creation.
Findings
Successfully replicated three species-trait matrices
Achieved over 75% F1-score in trait extraction
Found over half of species-trait pairs from text
Abstract
Plant morphological traits, their observable characteristics, are fundamental to understand the role played by each species within their ecosystem. However, compiling trait information for even a moderate number of species is a demanding task that may take experts years to accomplish. At the same time, massive amounts of information about species descriptions is available online in the form of text, although the lack of structure makes this source of data impossible to use at scale. To overcome this, we propose to leverage recent advances in large language models (LLMs) and devise a mechanism for gathering and processing information on plant traits in the form of unstructured textual descriptions, without manual curation. We evaluate our approach by automatically replicating three manually created species-trait matrices. Our method managed to find values for over half of all…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Data Mining Algorithms and Applications
