# Scientific knowledge graph and ontology generation using open large language models

**Authors:** Alexandru Oarga, Matthew Hart, Andres M. Bran, Magdalena Lederbauer, Philippe Schwaller

PMC · DOI: 10.1039/d5dd00275c · Digital Discovery · 2026-02-16

## TL;DR

This paper introduces a new method using open large language models to automatically generate knowledge graphs and ontologies from scientific literature, reducing the need for manual curation.

## Contribution

The novel contribution is a zero-shot, end-to-end approach for ontology and knowledge graph generation using open-source LLMs in scientific domains.

## Key findings

- The method successfully reconstructs existing knowledge graphs and ontologies of chemical elements and functional groups.
- It effectively generates structured knowledge in the complex and under-researched field of Single Atom Catalysts.
- The approach demonstrates potential for improving information retrieval and reasoning in specialized scientific areas.

## Abstract

Knowledge graphs (KGs) are powerful tools for structured information modeling, increasingly recognized for their potential to enhance the factuality and reasoning capabilities of Large Language Models (LLMs). However, in scientific domains, KG representation is often constrained by the absence of ontologies capable of modeling complex hierarchies and relationships inherent in the data. Moreover, the manual curation of KGs and ontologies from scientific literature remains a time-intensive task typically performed by domain experts. This work proposes a novel method leveraging LLMs for zero-shot, end-to-end ontology, and KG generation from scientific literature; implemented exclusively using open-source LLMs. We evaluate our approach by assessing its ability to reconstruct an existing KG and ontology of chemical elements and functional groups. Furthermore, we apply the method to the emerging field of Single Atom Catalysts (SACs), where information is scarce and unstructured. Our results demonstrate the effectiveness of our approach in automatically generating structured knowledge representations from complex scientific literature in areas where manual curation is challenging or time-consuming. The generated ontologies and KGs provide a foundation for improved information retrieval and reasoning in specialized fields, opening new avenues for LLM-assisted scientific research and knowledge management.

Knowledge graphs (KGs) are powerful tools for structured information modeling, increasingly recognized for their potential to enhance the factuality and reasoning capabilities of Large Language Models (LLMs).

## Full-text entities

- **Diseases:** AO (MESH:C535396), LLMs (MESH:D007806), ML (MESH:C537366), SAC (MESH:D012640), hallucination (MESH:D006212), OLfT (MESH:D007859)
- **Chemicals:** metal (MESH:D008670), LLM4OL (-), C (MESH:D002244), alkyne (MESH:D000480), alkane (MESH:D000473)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12928120/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12928120/full.md

## References

64 references — full list in the complete paper: https://tomesphere.com/paper/PMC12928120/full.md

---
Source: https://tomesphere.com/paper/PMC12928120