Assessing the effectiveness of ontology-grounded AI term extraction using OntoGPT for environmental evidence synthesis
Ryan Y. Hodgson, Steven A. Robinson, Amélie C. Boutin, Felix K. Chan, Joseph R. Bennett, Rachel T. Buxton, J. Harry Caufield, Dalal E. L. Hanna, Tim Alamenciak

TL;DR
This paper explores using OntoGPT, an AI tool grounded in ontologies, to extract information from environmental science articles, showing potential for speeding up evidence synthesis but with limitations in complex or subjective data.
Contribution
The study introduces OntoGPT, a novel AI method combining large language models with ontologies for structured information extraction in environmental evidence synthesis.
Findings
OntoGPT achieved 65% average agreement with human reviewers in extracting information from environmental science articles.
Precision and recall scores were 58% and 57%, respectively, indicating moderate performance.
Agreement was higher for standardized information and lower for study-specific or interpretation-heavy data.
Abstract
Evidence syntheses are valuable sources of robust and transparent knowledge that can identify gaps in research and inform evidence-based decision making. However, the process of synthesis is time consuming and costly. We investigate a new AI-based method that uses a large-language model (LLM) grounded in ontologies (i.e. structured machine-interpretable glossaries of domain terminology) to extract information from a set of 80 articles on coastal wetland restoration outcomes. We evaluated this method by comparing human-extracted data with data extracted by OntoGPT — a Python package that combines an LLM with ontologies to extract structured information. We found that OntoGPT achieved 65% average agreement with human reviewers but varied based on information type requested for extraction. The highest agreement scores were found when extracting standardized information, and lower agreement…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Environmental Monitoring and Data Management · Geographic Information Systems Studies
