Analysing Text in Software Projects
S. Wagner, D. M\'endez Fern\'andez

TL;DR
This paper reviews methods for qualitative and quantitative analysis of textual data in software projects, highlighting techniques like N-Grams, clone detection, and NLP, supported by industrial case studies.
Contribution
It provides a comprehensive overview of text analysis methods in software engineering, combining manual, mixed, and automated approaches with practical industrial examples.
Findings
Text analysis methods are crucial for managing large textual data in software projects.
NLP and clone detection effectively identify patterns and similarities in software texts.
Industrial case studies demonstrate practical applications of these methods.
Abstract
Most of the data produced in software projects is of textual nature: source code, specifications, or documentations. The advances in quantitative analysis methods drove a lot of data analytics in software engineering. This has overshadowed to some degree the importance of texts and their qualitative analysis. Such analysis has, however, merits for researchers and practitioners as well. In this chapter, we describe the basics of analysing text in software projects. We first describe how to manually analyse and code textual data. Next, we give an overview of mixed methods to automatic text analysis including N-Grams and clone detection as well as more sophisticated natural language processing identifying syntax and contexts of words. Those methods and tools are of critical importance to aid in the challenges in today's huge amounts of textual data. We illustrate the introduced methods…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
