TL;DR
This paper introduces Senti4SD, a sentiment classifier tailored for software developers' communication, trained on Stack Overflow data to improve accuracy over general-purpose tools in technical contexts.
Contribution
The paper presents Senti4SD, a novel sentiment analysis tool specifically designed for software engineering, trained on domain-specific data and features to enhance classification accuracy.
Findings
Senti4SD outperforms off-the-shelf tools in classifying developer communications.
It reduces misclassification of neutral and positive posts as negative.
The authors release the classifier and resources for reproducibility.
Abstract
The role of sentiment analysis is increasingly emerging to study software developers' emotions by mining crowd-generated content within social software engineering tools. However, off-the-shelf sentiment analysis tools have been trained on non-technical domains and general-purpose social media, thus resulting in misclassifications of technical jargon and problem reports. Here, we present Senti4SD, a classifier specifically trained to support sentiment analysis in developers' communication channels. Senti4SD is trained and validated using a gold standard of Stack Overflow questions, answers, and comments manually annotated for sentiment polarity. It exploits a suite of both lexicon- and keyword-based features, as well as semantic features based on word embedding. With respect to a mainstream off-the-shelf tool, which we use as a baseline, Senti4SD reduces the misclassifications of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
