TL;DR
This paper presents an automated pipeline for extracting UML class diagrams from natural language specifications, aiming to facilitate model-driven engineering despite current low precision and recall.
Contribution
It introduces a novel dataset and a structured pipeline approach for UML diagram extraction from natural language, establishing a benchmark for future research.
Findings
The approach provides a baseline for UML diagram extraction.
It achieves low precision and recall, highlighting room for improvement.
A new dataset and evaluation framework are introduced.
Abstract
In model-driven engineering (MDE), UML class diagrams serve as a way to plan and communicate between developers. However, it is complex and resource-consuming. We propose an automated approach for the extraction of UML class diagrams from natural language software specifications. To develop our approach, we create a dataset of UML class diagrams and their English specifications with the help of volunteers. Our approach is a pipeline of steps consisting of the segmentation of the input into sentences, the classification of the sentences, the generation of UML class diagram fragments from sentences, and the composition of these fragments into one UML class diagram. We develop a quantitative testing framework specific to UML class diagram extraction. Our approach yields low precision and recall but serves as a benchmark for future research.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
