Semi-Automated Labeling of Requirement Datasets for Relation Extraction

Jeremias Bohn; Jannik Fischbach; Martin Schmitt; Hinrich; Sch\"utze; Andreas Vogelsang

arXiv:2109.02050·cs.SE·September 7, 2021

Semi-Automated Labeling of Requirement Datasets for Relation Extraction

Jeremias Bohn, Jannik Fischbach, Martin Schmitt, Hinrich, Sch\"utze, Andreas Vogelsang

PDF

Open Access

TL;DR

This paper introduces a semi-automatic framework for labeling requirement datasets for relation extraction, reducing manual effort and bias, and provides a new dataset with both automatic and manual labels.

Contribution

The paper presents a novel semi-automatic labeling framework and a new dataset for relation extraction in requirements engineering.

Findings

01

Significant overlap between automatic and manual labels

02

Framework reduces manual labeling effort

03

Dataset facilitates future research in relation extraction

Abstract

Creating datasets manually by human annotators is a laborious task that can lead to biased and inhomogeneous labels. We propose a flexible, semi-automatic framework for labeling data for relation extraction. Furthermore, we provide a dataset of preprocessed sentences from the requirements engineering domain, including a set of automatically created as well as hand-crafted labels. In our case study, we compare the human and automatic labels and show that there is a substantial overlap between both annotations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech and dialogue systems · Software Engineering Research