A Comprehensive Study on the Use of Word Embedding Models in Software Engineering Domain

Xiaohan Chen; Weiqin Zou; Lianyi Zhi; Qianshuang Meng; Jingxuan Zhang

arXiv:2505.17634·cs.SE·May 26, 2025

A Comprehensive Study on the Use of Word Embedding Models in Software Engineering Domain

Xiaohan Chen, Weiqin Zou, Lianyi Zhi, Qianshuang Meng, Jingxuan Zhang

PDF

TL;DR

This paper provides a comprehensive analysis of how word embedding models are used in software engineering, comparing methods, training strategies, and discussing challenges to improve semantic representations of software artifacts.

Contribution

It systematically reviews 181 studies, offering a unified view of current practices and identifying challenges in applying word embeddings to software engineering tasks.

Findings

01

Systematic overview of WE applications in SE

02

Comparison of WE with traditional semantic methods

03

Identification of challenges in adopting WE in SE

Abstract

Word embedding (WE) techniques are advanced textual semantic representation models oriented from the natural language processing (NLP) area. Inspired by their effectiveness in facilitating various NLP tasks, more and more researchers attempt to adopt these WE models for their software engineering (SE) tasks, of which semantic representation of software artifacts such as bug reports and code snippets is the basis for further model building. However, existing studies are generally isolated from each other without comprehensive comparison and discussion. This not only makes the best practice of such cross-discipline technique adoption buried in scattered papers, but also makes us kind of blind to current progress in the semantic representation of SE artifacts. To this end, we decided to perform a comprehensive study on the use of WE models in the SE domain. 181 primary studies published in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.