VSPO: Validating Semantic Pitfalls in Ontology via LLM-Based CQ Generation

Hyojun Choi; Seokju Hwang; Kyong-Ho Lee

arXiv:2511.07991·cs.AI·November 19, 2025

VSPO: Validating Semantic Pitfalls in Ontology via LLM-Based CQ Generation

Hyojun Choi, Seokju Hwang, Kyong-Ho Lee

PDF

Open Access 1 Video

TL;DR

This paper introduces VSPO, a dataset and model leveraging LLMs to generate competency questions that detect semantic pitfalls in ontologies, improving validation accuracy and reducing manual effort.

Contribution

The study presents the first LLM-based approach specifically designed to validate semantic pitfalls in ontology competency questions, enhancing detection of modeling errors.

Findings

01

Model achieves 26% higher precision than GPT-4.1

02

Model achieves 28.2% higher recall than GPT-4.1

03

Generates broader range of error-detecting CQs

Abstract

Competency Questions (CQs) play a crucial role in validating ontology design. While manually crafting CQs can be highly time-consuming and costly for ontology engineers, recent studies have explored the use of large language models (LLMs) to automate this process. However, prior approaches have largely evaluated generated CQs based on their similarity to existing datasets, which often fail to verify semantic pitfalls such as "Misusing allValuesFrom". Since such pitfalls cannot be reliably detected through rule-based methods, we propose a novel dataset and model of Validating Semantic Pitfalls in Ontology (VSPO) for CQ generation specifically designed to verify the semantic pitfalls. To simulate missing and misused axioms, we use LLMs to generate natural language definitions of classes and properties and introduce misalignments between the definitions and the ontology by removing axioms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

VSPO: Validating Semantic Pitfalls in Ontology via LLM-Based CQ Generation· underline

Taxonomy

TopicsSemantic Web and Ontologies · Topic Modeling · Advanced Graph Neural Networks