On the validity of pre-trained transformers for natural language processing in the software engineering domain
Julian von der Mosel, Alexander Trautsch, Steffen Herbold

TL;DR
This paper evaluates the effectiveness of pre-trained transformer models in the software engineering domain, comparing models trained on domain-specific data versus general data across vocabulary, understanding, and classification tasks.
Contribution
It provides an empirical comparison of domain-specific and general transformer models, highlighting when domain-specific pre-training improves performance in software engineering tasks.
Findings
Domain-specific models excel in software engineering context understanding.
General models are sufficient for general language tasks within software engineering.
Pre-training on software engineering data benefits context-specific classification tasks.
Abstract
Transformers are the current state-of-the-art of natural language processing in many domains and are using traction within software engineering research as well. Such models are pre-trained on large amounts of data, usually from the general domain. However, we only have a limited understanding regarding the validity of transformers within the software engineering domain, i.e., how good such models are at understanding words and sentences within a software engineering context and how this improves the state-of-the-art. Within this article, we shed light on this complex, but crucial issue. We compare BERT transformer models trained with software engineering data with transformers based on general domain data in multiple dimensions: their vocabulary, their ability to understand which words are missing, and their performance in classification tasks. Our results show that for tasks that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Engineering Techniques and Practices · Topic Modeling
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Linear Warmup With Linear Decay · Softmax · Attention Dropout · Dense Connections · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Adam
