Semi-Automated Labeling of Requirement Datasets for Relation Extraction
Jeremias Bohn, Jannik Fischbach, Martin Schmitt, Hinrich, Sch\"utze, Andreas Vogelsang

TL;DR
This paper introduces a semi-automatic framework for labeling requirement datasets for relation extraction, reducing manual effort and bias, and provides a new dataset with both automatic and manual labels.
Contribution
The paper presents a novel semi-automatic labeling framework and a new dataset for relation extraction in requirements engineering.
Findings
Significant overlap between automatic and manual labels
Framework reduces manual labeling effort
Dataset facilitates future research in relation extraction
Abstract
Creating datasets manually by human annotators is a laborious task that can lead to biased and inhomogeneous labels. We propose a flexible, semi-automatic framework for labeling data for relation extraction. Furthermore, we provide a dataset of preprocessed sentences from the requirements engineering domain, including a set of automatically created as well as hand-crafted labels. In our case study, we compare the human and automatic labels and show that there is a substantial overlap between both annotations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems · Software Engineering Research
