LLM-based Generation of Semantically Diverse and Realistic Domain Model Instances
Andrei Coman, Lola Burgue\~no, Dominik Bork, Manuel Wimmer

TL;DR
This paper introduces a method using Large Language Models and specific prompting strategies to generate semantically realistic and diverse domain model instances, enhancing human understanding and research utility.
Contribution
The approach combines LLMs, prompting strategies, and validation tools to produce semantically coherent and diverse UML class diagram instances, addressing key challenges in domain modeling.
Findings
Generated instances are mostly syntactically correct.
Instances conform to the domain models with few semantic errors.
Values in generated models are semantically diverse and coherent.
Abstract
Large Language Models (LLMs) have been recently proposed for supporting domain modeling tasks mostly related to the completion of partial models by recommending additional model elements. However, there are many more modeling tasks, one of them being the instantiation of domain models to represent concrete domain objects. While there is considerable work supporting the generation of structurally valid instantiations, there are still open challenges to incorporating real-world semantics by having realistic values contained in instances and ensuring the generation of semantically diverse models. Only then will such generated models become human-understandable and helpful in educational or data-driven research contexts. To tackle these challenges, this paper presents an approach that employs LLMs and two prompting strategies in combination with existing model validation tools for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
