Synthline: A Product Line Approach for Synthetic Requirements Engineering Data Generation using Large Language Models
Abdelkarim El-Hajjami, Camille Salinesi

TL;DR
Synthline introduces a product line approach using Large Language Models to generate synthetic requirements engineering data, improving ML model training and addressing data scarcity issues in RE.
Contribution
The paper presents a novel PL-based method for systematic synthetic RE data generation using LLMs, enhancing data diversity and utility for ML applications.
Findings
Synthetic data has lower diversity but is effective for training.
Hybrid datasets improve ML performance significantly.
Up to 85% precision improvement with combined data.
Abstract
While modern Requirements Engineering (RE) heavily relies on natural language processing and Machine Learning (ML) techniques, their effectiveness is limited by the scarcity of high-quality datasets. This paper introduces Synthline, a Product Line (PL) approach that leverages Large Language Models to systematically generate synthetic RE data for classification-based use cases. Through an empirical evaluation conducted in the context of using ML for the identification of requirements specification defects, we investigated both the diversity of the generated data and its utility for training downstream models. Our analysis reveals that while synthetic datasets exhibit less diversity than real data, they are good enough to serve as viable training resources. Moreover, our evaluation shows that combining synthetic and real data leads to substantial performance improvements. Specifically,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Software Engineering Methodologies · Service-Oriented Architecture and Web Services · Model-Driven Software Engineering Techniques
