A Hybrid AI Methodology for Generating Ontologies of Research Topics from Scientific Paper Corpora
Alessia Pisu, Livio Pompianu, Francesco Osborne, Diego Reforgiato Recupero, Daniele Riboni, Angelo Salatino

TL;DR
This paper introduces Sci-OG, a semi-automated system that leverages language models and semantic analysis to generate research topic ontologies from scientific papers, enhancing literature organization and exploration.
Contribution
The paper presents a novel multi-step methodology combining topic extraction, relationship classification, and ontology construction using advanced language models, outperforming existing approaches.
Findings
Achieved an F1 score of 0.951 in relationship classification.
Outperformed fine-tuned SciBERT and GPT4-mini baselines.
Demonstrated practical extension of the CSO ontology in cybersecurity.
Abstract
Taxonomies and ontologies of research topics (e.g., MeSH, UMLS, CSO, NLM) play a central role in providing the primary framework through which intelligent systems can explore and interpret the literature. However, these resources have traditionally been manually curated, a process that is time-consuming, prone to obsolescence, and limited in granularity. This paper presents Sci-OG, a semi-auto\-mated methodology for generating research topic ontologies, employing a multi-step approach: 1) Topic Discovery, extracting potential topics from research papers; 2) Relationship Classification, determining semantic relationships between topic pairs; and 3) Ontology Construction, refining and organizing topics into a structured ontology. The relationship classification component, which constitutes the core of the system, integrates an encoder-based language model with features describing topic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
