Sentiment Analysis in Software Engineering: Evaluating Generative Pre-trained Transformers
KM Khalid Saifullah, Faiaz Azmain, Habiba Hye

TL;DR
This paper evaluates the performance of generative pre-trained transformers like GPT-4o-mini against traditional models like BERT for sentiment analysis in software engineering, highlighting the impact of dataset complexity and fine-tuning.
Contribution
It provides a systematic benchmark of GPT-4o-mini and BERT in SE sentiment analysis, revealing conditions where generative models outperform or match traditional models.
Findings
Fine-tuned GPT-4o-mini achieves high F1-scores on structured datasets.
Default GPT-4o-mini generalizes better on complex, imbalanced datasets.
Trade-offs exist between fine-tuning and using pre-trained models for SE sentiment analysis.
Abstract
Sentiment analysis plays a crucial role in understanding developer interactions, issue resolutions, and project dynamics within software engineering (SE). While traditional SE-specific sentiment analysis tools have made significant strides, they often fail to account for the nuanced and context-dependent language inherent to the domain. This study systematically evaluates the performance of bidirectional transformers, such as BERT, against generative pre-trained transformers, specifically GPT-4o-mini, in SE sentiment analysis. Using datasets from GitHub, Stack Overflow, and Jira, we benchmark the models' capabilities with fine-tuned and default configurations. The results reveal that fine-tuned GPT-4o-mini performs comparable to BERT and other bidirectional models on structured and balanced datasets like GitHub and Jira, achieving macro-averaged F1-scores of 0.93 and 0.98, respectively.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Scientific Computing and Data Management · Software Engineering Research
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Warmup With Linear Decay · Softmax · Attention Dropout · WordPiece · Linear Layer · Residual Connection · Weight Decay · Dropout
