Term Interrelations and Trends in Software Engineering

Janusan Baskararajah; Lei Zhang; Andriy Miranskyy

arXiv:2108.09529·cs.SE·August 24, 2021

Term Interrelations and Trends in Software Engineering

Janusan Baskararajah, Lei Zhang, Andriy Miranskyy

PDF

1 Repo

TL;DR

This paper presents a tool that uses word embeddings trained on a large SE corpus to extract term interrelations and trends, aiding community understanding and research navigation.

Contribution

It introduces a novel approach to analyze SE literature by applying word embedding techniques to identify term relationships and trends.

Findings

01

Embeddings trained on SE texts can summarize term relations.

02

The tool uncovers emerging trends in software engineering.

03

Validation tests support the embeddings' effectiveness.

Abstract

The Software Engineering (SE) community is prolific, making it challenging for experts to keep up with the flood of new papers and for neophytes to enter the field. Therefore, we posit that the community may benefit from a tool extracting terms and their interrelations from the SE community's text corpus and showing terms' trends. In this paper, we build a prototyping tool using the word embedding technique. We train the embeddings on the SE Body of Knowledge handbook and 15,233 research papers' titles and abstracts. We also create test cases necessary for validation of the training of the embeddings. We provide representative examples showing that the embeddings may aid in summarizing terms and uncovering trends in the knowledge base.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

miranska/se-tti
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.