Predicting Issue Types with seBERT

Alexander Trautsch; Steffen Herbold

arXiv:2205.01335·cs.SE·May 4, 2022

Predicting Issue Types with seBERT

Alexander Trautsch, Steffen Herbold

PDF

Open Access 1 Repo

TL;DR

This paper introduces seBERT, a transformer-based model trained on software engineering data, which outperforms baseline methods in predicting issue types with high accuracy.

Contribution

The paper presents seBERT, a novel pre-trained transformer model specifically trained on software engineering data for issue type prediction, achieving superior performance.

Findings

01

seBERT achieves an F1-score of 85.7%.

02

seBERT outperforms fastText baseline in recall and precision.

03

The model demonstrates strong effectiveness across all issue types.

Abstract

Pre-trained transformer models are the current state-of-the-art for natural language models processing. seBERT is such a model, that was developed based on the BERT architecture, but trained from scratch with software engineering data. We fine-tuned this model for the NLBSE challenge for the task of issue type prediction. Our model dominates the baseline fastText for all three issue types in both recall and precisio} to achieve an overall F1-score of 85.7%, which is an increase of 4.1% over the baseline.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

atrautsch/nlbse2022_replication_kit
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Topic Modeling · Software System Performance and Reliability

MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Softmax · Weight Decay · Adam · Attention Dropout · Dense Connections · Dropout · Linear Warmup With Linear Decay