Characterising LLM-Generated Competency Questions: a Cross-Domain Empirical Study using Open and Closed Models

Reham Alharbi; Valentina Tamma; Terry R. Payne; Jacopo de Berardinis

arXiv:2604.16258·cs.AI·April 20, 2026

Characterising LLM-Generated Competency Questions: a Cross-Domain Empirical Study using Open and Closed Models

Reham Alharbi, Valentina Tamma, Terry R. Payne, Jacopo de Berardinis

PDF

TL;DR

This study systematically compares competency questions generated by various large language models across multiple domains, analyzing their readability, relevance, and structural complexity to understand their intrinsic properties.

Contribution

It introduces quantitative measures for cross-domain analysis of LLM-generated competency questions, highlighting how different models and use cases influence their properties.

Findings

01

LLMs produce competency questions with varying readability and complexity.

02

Generation profiles differ significantly across open and closed models.

03

Use case context impacts the quality and properties of generated questions.

Abstract

Competency Questions (CQs) are a cornerstone of requirement elicitation in ontology engineering. CQs represent requirements as a set of natural language questions that an ontology should satisfy; they are traditionally modelled by ontology engineers together with domain experts as part of a human-centred, manual elicitation process. The use of Generative AI automates CQ creation at scale, therefore democratising the process of generation, widening stakeholder engagement, and ultimately broadening access to ontology engineering. However, given the large and heterogeneous landscape of LLMs, varying in dimensions such as parameter scale, task and domain specialisation, and accessibility, it is crucial to characterise and understand the intrinsic, observable properties of the CQs they produce (e.g., readability, structural complexity) through a systematic, cross-domain analysis. This paper…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.