On the validity of pre-trained transformers for natural language   processing in the software engineering domain

Julian von der Mosel; Alexander Trautsch; Steffen Herbold

arXiv:2109.04738·cs.SE·May 16, 2022

On the validity of pre-trained transformers for natural language processing in the software engineering domain

Julian von der Mosel, Alexander Trautsch, Steffen Herbold

PDF

Open Access

TL;DR

This paper evaluates the effectiveness of pre-trained transformer models in the software engineering domain, comparing models trained on domain-specific data versus general data across vocabulary, understanding, and classification tasks.

Contribution

It provides an empirical comparison of domain-specific and general transformer models, highlighting when domain-specific pre-training improves performance in software engineering tasks.

Findings

01

Domain-specific models excel in software engineering context understanding.

02

General models are sufficient for general language tasks within software engineering.

03

Pre-training on software engineering data benefits context-specific classification tasks.

Abstract

Transformers are the current state-of-the-art of natural language processing in many domains and are using traction within software engineering research as well. Such models are pre-trained on large amounts of data, usually from the general domain. However, we only have a limited understanding regarding the validity of transformers within the software engineering domain, i.e., how good such models are at understanding words and sentences within a software engineering context and how this improves the state-of-the-art. Within this article, we shed light on this complex, but crucial issue. We compare BERT transformer models trained with software engineering data with transformers based on general domain data in multiple dimensions: their vocabulary, their ability to understand which words are missing, and their performance in classification tasks. Our results show that for tasks that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Engineering Techniques and Practices · Topic Modeling

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Linear Warmup With Linear Decay · Softmax · Attention Dropout · Dense Connections · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Adam