Extracting Software Requirements from Unstructured Documents
Vladimir Ivanov, Andrey Sadovykh, Alexandr Naumchev, Alessandra, Bagnato, Kirill Yakovlev

TL;DR
This paper presents a new annotated dataset and fine-tuned BERT model for extracting software requirements from unstructured textual documents, demonstrating high accuracy and robustness across different document types.
Contribution
The creation of the PURE dataset with manual annotations and the application of BERT fine-tuning for requirement extraction from unstructured text.
Findings
BERT achieved high precision and recall on the PURE dataset.
The approach is effective on less standardized RFI documents.
The method outperforms several baseline models.
Abstract
Requirements identification in textual documents or extraction is a tedious and error prone task that many researchers suggest automating. We manually annotated the PURE dataset and thus created a new one containing both requirements and non-requirements. Using this dataset, we fine-tuned the BERT model and compare the results with several baselines such as fastText and ELMo. In order to evaluate the model on semantically more complex documents we compare the PURE dataset results with experiments on Request For Information (RFI) documents. The RFIs often include software requirements, but in a less standardized way. The fine-tuned BERT showed promising results on PURE dataset on the binary sentence classification task. Comparing with previous and recent studies dealing with constrained inputs, our approach demonstrates high performance in terms of precision and recall metrics, while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Engineering Techniques and Practices · Software System Performance and Reliability
