Building a Pilot Software Quality-in-Use Benchmark Dataset
Issa Atoum, Chih How Bong, Narayanan Kulathuramaiyer

TL;DR
This paper introduces a new annotated dataset for software quality-in-use, created through expert annotation and reconciliation, to support sentiment analysis and model evaluation in software quality assessment.
Contribution
It presents a novel domain-specific dataset and annotation scheme for software quality-in-use, enabling improved supervised learning and model evaluation.
Findings
Achieved moderate to substantial inter-annotator agreement
Dataset suitable for sentiment analysis in software quality
Annotation scheme can extend dataset coverage
Abstract
Prepared domain specific datasets plays an important role to supervised learning approaches. In this article a new sentence dataset for software quality-in-use is proposed. Three experts were chosen to annotate the data using a proposed annotation scheme. Then the data were reconciled in a (no match eliminate) process to reduce bias. The Kappa, k statistics revealed an acceptable level of agreement; moderate to substantial agreement between the experts. The built data can be used to evaluate software quality-in-use models in sentiment analysis models. Moreover, the annotation scheme can be used to extend the current dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Sentiment Analysis and Opinion Mining · Topic Modeling
