# MatKG: An autonomously generated knowledge graph in Material Science

**Authors:** Vineeth Venugopal, Elsa Olivetti

PMC · DOI: 10.1038/s41597-024-03039-z · 2024-02-17

## TL;DR

MatKG is a large knowledge graph in materials science that organizes data from scientific literature to aid material discovery and analysis.

## Contribution

MatKG is the largest autonomously generated knowledge graph in materials science, containing over 70,000 entities and 5.4 million triples.

## Key findings

- MatKG includes entities like materials, properties, and synthesis methods extracted via natural language processing.
- The graph is available in CSV and RDF formats, with code and data shared publicly for research use.
- MatKG supports applications such as material discovery and recommendation systems.

## Abstract

In this paper, we present MatKG, a knowledge graph in materials science that offers a repository of entities and relationships extracted from scientific literature. Using advanced natural language processing techniques, MatKG includes an array of entities, including materials, properties, applications, characterization and synthesis methods, descriptors, and symmetry phase labels. The graph is formulated based on statistical metrics, encompassing over 70,000 entities and 5.4 million unique triples. To enhance accessibility and utility, we have serialized MatKG in both CSV and RDF formats and made these, along with the code base, available to the research community. As the largest knowledge graph in materials science to date, MatKG provides structured organization of domain-specific data. Its deployment holds promise for various applications, including material discovery, recommendation systems, and advanced analytics.

## Full-text entities

- **Genes:** CHM (CHM Rab escort protein) [NCBI Gene 1121] {aka DXS540, GGTA, HSD-32, REP-1, TCD}, CNTN2 (contactin 2) [NCBI Gene 6900] {aka AXT, EPEO5, FAME5, TAG-1, TAX, TAX1}
- **Diseases:** CHM-APL (MESH:D015794)
- **Chemicals:** Fe2O3 (MESH:C000499), Alkyl Hydroperoxide (-), PVDF (MESH:C024865), Ammonia (MESH:D000641), Cadmium Telluride (MESH:C028337), Bismuth (MESH:D001729), ethanol (MESH:D000431), methanol (MESH:D000432), graphene (MESH:D006108), CH4 (MESH:D008697), In2O3 (MESH:C047711), perovskite (MESH:C059910), sulfuric acid (MESH:C033158), platinum (MESH:D010984), Anatase (MESH:C009495), Alkane (MESH:D000473)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC10874416/full.md

---
Source: https://tomesphere.com/paper/PMC10874416