# Leveraging Large Language Models for Generating Research Topic Ontologies: A Multi-Disciplinary Study

**Authors:** Tanay Aggarwal, Angelo Salatino, Francesco Osborne, Enrico Motta

arXiv: 2508.20693 · 2025-08-29

## TL;DR

This study explores how large language models can be used to automatically generate research topic ontologies across multiple disciplines, aiming to improve efficiency and coverage in scientific knowledge organization.

## Contribution

The paper introduces PEM-Rel-8K, a new dataset for research relationships, and evaluates LLMs' ability to generate research ontologies across disciplines with fine-tuning and prompting methods.

## Key findings

- Fine-tuning LLMs on PEM-Rel-8K achieves high accuracy.
- Models perform well across disciplines after fine-tuning.
- Cross-domain transferability of models is promising.

## Abstract

Ontologies and taxonomies of research fields are critical for managing and organising scientific knowledge, as they facilitate efficient classification, dissemination and retrieval of information. However, the creation and maintenance of such ontologies are expensive and time-consuming tasks, usually requiring the coordinated effort of multiple domain experts. Consequently, ontologies in this space often exhibit uneven coverage across different disciplines, limited inter-domain connectivity, and infrequent updating cycles. In this study, we investigate the capability of several large language models to identify semantic relationships among research topics within three academic domains: biomedicine, physics, and engineering. The models were evaluated under three distinct conditions: zero-shot prompting, chain-of-thought prompting, and fine-tuning on existing ontologies. Additionally, we assessed the cross-domain transferability of fine-tuned models by measuring their performance when trained in one domain and subsequently applied to a different one. To support this analysis, we introduce PEM-Rel-8K, a novel dataset consisting of over 8,000 relationships extracted from the most widely adopted taxonomies in the three disciplines considered in this study: MeSH, PhySH, and IEEE. Our experiments demonstrate that fine-tuning LLMs on PEM-Rel-8K yields excellent performance across all disciplines.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.20693/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/2508.20693/full.md

## References

66 references — full list in the complete paper: https://tomesphere.com/paper/2508.20693/full.md

---
Source: https://tomesphere.com/paper/2508.20693