(How) Do Large Language Models Understand High-Level Message Sequence Charts?
Mohammad Reza Mousavi

TL;DR
This study evaluates whether large language models understand the formal semantics of High-Level Message Sequence Charts, revealing modest overall comprehension with variability across different semantic tasks.
Contribution
It provides an empirical assessment of LLMs' understanding of HMSCs' formal semantics, highlighting their strengths and limitations in semantic reasoning tasks.
Findings
LLMs achieve about 52% overall accuracy on HMSC semantic tasks.
They understand basic semantic concepts of MSCs with approximately 88% accuracy.
They struggle with abstraction, composition, and causal dependency reasoning, with accuracy around 36-42%.
Abstract
Large Language Models (LLMs) are being employed widely to automate tasks across the software development life-cycle. It is, however, unclear whether these tasks are performed consistently with respect to the semantics of the artefacts being handled. This question is particularly under-researched concerning architectural design specification. In this paper, we address this question for High-Level Message Sequence Charts (HMSCs). These are visual models with a rigorous formal semantics that have been used for various purposes, including as a foundation for Sequence Diagrams in the Unified Modelling Language (UML). We examine whether LLMs "understand" the semantics of HMSCs by examining three LLMs (Gemini-3, GPT-5.4, and Qwen-3.6) on how they perform 129 semantic tasks ranging from querying basic semantic constructs in HMSCs (i.e., events and their ordering) to semantic-preserving…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
