# Plant attribute extraction: An enhancing three-stage deep learning model for relational triple extraction

**Authors:** Zhihao Zong, Hongtao Shan, Gaoyu Zhang, George Xianzhi Yuan, Shuyi Zhang, Fu Lee Wang, Jin Liu, Jin Liu, Jin Liu

PMC · DOI: 10.1371/journal.pone.0327186 · PLOS One · 2025-07-08

## TL;DR

This paper introduces a three-stage deep learning model for extracting structured plant attribute information from text, improving efficiency and accuracy over existing methods.

## Contribution

A novel three-stage model called Bwdgv that enhances relational triple extraction for plant attributes using improved BERT embedding and relation prediction techniques.

## Key findings

- The Bwdgv model improves F1-score by 1.4% compared to the PRGC model.
- Improved BERT embedding integration enhances contextual information and reduces error interference.
- Multi-level fusion in relation prediction helps highlight important information and correct errors.

## Abstract

Various plant attributes, such as growing environment, growth cycle, and ecological distribution, can provide support to fields like agricultural production and biodiversity. This information is widely dispersed in texts. Manual extraction of this information is highly inefficient due to a fact that it not only takes considerable time but also increases the likelihood of overlooking relevant details. To convert textual data into structured information, we extract relational triples in the form of (subject, relation, object), where the subject represents the names of plants, the object represents the plant attributes, and the relation represents the classification of plant attributes. To reduce complexity, we employ a joint extraction of entities and relations based on a tagging scheme. The task is broken down into three parts. Firstly, a matrix is used to simultaneously match plant entities and plant attributes. Then, the predefined categories of plant attributes are classified. Finally, the categories of plant attributes are matched with entity-attribute pairs. The tagging-based method typically utilizes parameter sharing to facilitate interaction between different tasks, but it can also lead to issues such as error amplification and instability in parameter updates. Thus, we adopt improved techniques at different stages to enhance the performance of our model. This includes adjustment to the word embedding layer of BERT and optimization in relation prediction. The modification of the word embedding layer is intended to better integrate contextual information during text representation and reduce the interference of erroneous information. The relation prediction part mainly involves multi-level information fusion of textual information, thereby making corrections and highlighting important information. We name the three-stage method as “Bwdgv”. Compared to the currently advanced PRGC model, the F1-score of the proposed method has an improvement of 1.4%. With the help of extracted triples, we can construct knowledge graphs and other tasks to better apply various plant attributes.

## Full-text entities

- **Genes:** EPO (erythropoietin) [NCBI Gene 2056] {aka DBAL, ECYT5, EP, MVCD2}, ADGRL4 (adhesion G protein-coupled receptor L4) [NCBI Gene 64123] {aka ELTD1, ETL, KPG_003}
- **Diseases:** Iris (MESH:D007499), SEO (MESH:D012640), Skin Diseases (MESH:D012871)
- **Chemicals:** Glyce (-), essential oils (MESH:D009822)
- **Species:** Homo sapiens (human, species) [taxon 9606], Iris dichotoma (species) [taxon 148544]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12237039/full.md

## Figures

14 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12237039/full.md

## References

41 references — full list in the complete paper: https://tomesphere.com/paper/PMC12237039/full.md

---
Source: https://tomesphere.com/paper/PMC12237039