MCeT: Behavioral Model Correctness Evaluation using Large Language Models
Khaled Ahmed, Jialing Song, Boqi Chen, Ou Wei, Bingzhou Zheng

TL;DR
This paper introduces MCeT, an automated tool leveraging large language models to evaluate the correctness of behavioral sequence diagrams against requirements, significantly improving issue detection accuracy over direct LLM comparisons.
Contribution
The paper presents the first fully automated LLM-based tool for behavioral model correctness evaluation, combining multi-perspective analysis and self-consistency checks to enhance accuracy.
Findings
Improved precision from 0.58 to 0.81 in correctness evaluation.
Detected 90% more issues than experienced engineers.
Reports an average of 6 new issues per diagram.
Abstract
Behavioral model diagrams, e.g., sequence diagrams, are an essential form of documentation that are typically designed by system engineers from requirements documentation, either fully manually or assisted by design tools. With the growing use of Large Language Models (LLM) as AI modeling assistants, more automation will be involved in generating diagrams. This necessitates the advancement of automatic model correctness evaluation tools. Such a tool can be used to evaluate both manually and AI automatically generated models; to provide feedback to system engineers, and enable AI assistants to self-evaluate and self-enhance their generated models. In this paper, we propose MCeT, the first fully automated tool to evaluate the correctness of a behavioral model, sequence diagrams in particular, against its corresponding requirements text and produce a list of issues that the model has. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Engineering Techniques and Practices · Model-Driven Software Engineering Techniques
