BERT_SE: A Pre-trained Language Representation Model for Software Engineering
Eliane Maria De Bortoli F\'avero, Dalcimar Casanova

TL;DR
This paper introduces BERT_SE, a domain-specific pre-trained language model for software engineering, which improves software requirements classification accuracy by 13% over generic BERT models.
Contribution
The paper presents BERT_SE, a novel contextualized embedding model tailored for software engineering, addressing the lack of domain-specific NLP tools in SE.
Findings
BERT_SE outperforms BERT_base by 13% in classification tasks.
The model effectively recognizes domain-specific terms in SE.
Code and models are publicly available for research use.
Abstract
The application of Natural Language Processing (NLP) has achieved a high level of relevance in several areas. In the field of software engineering (SE), NLP applications are based on the classification of similar texts (e.g. software requirements), applied in tasks of estimating software effort, selection of human resources, etc. Classifying software requirements has been a complex task, considering the informality and complexity inherent in the texts produced during the software development process. The pre-trained embedding models are shown as a viable alternative when considering the low volume of textual data labeled in the area of software engineering, as well as the lack of quality of these data. Although there is much research around the application of word embedding in several areas, to date, there is no knowledge of studies that have explored its application in the creation of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
