On the Subjectivity of Emotions in Software Projects: How Reliable are Pre-Labeled Data Sets for Sentiment Analysis?
Marc Herrmann, Martin Obaidi, Larissa Chazette, Jil Kl\"under

TL;DR
This study examines the reliability of pre-labeled sentiment data sets in software projects by comparing them with perceptions of software team members, revealing significant individual disagreement and the impact of labeling guidelines.
Contribution
It provides empirical evidence on the subjectivity of sentiment labels in software project data sets and highlights the importance of labeling guidelines for better reliability.
Findings
62.5% median label agreement with participant perceptions
No participant fully agrees with predefined labels
Guideline-based data sets perform better than ad hoc labels
Abstract
Social aspects of software projects become increasingly important for research and practice. Different approaches analyze the sentiment of a development team, ranging from simply asking the team to so-called sentiment analysis on text-based communication. These sentiment analysis tools are trained using pre-labeled data sets from different sources, including GitHub and Stack Overflow. In this paper, we investigate if the labels of the statements in the data sets coincide with the perception of potential members of a software project team. Based on an international survey, we compare the median perception of 94 participants with the pre-labeled data sets as well as every single participant's agreement with the predefined labels. Our results point to three remarkable findings: (1) Although the median values coincide with the predefined labels of the data sets in 62.5% of the cases, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Sentiment Analysis and Opinion Mining · Hate Speech and Cyberbullying Detection
