Can You Explain That, Better? Comprehensible Text Analytics for SE   Applications

Amritanshu Agrawal; Huy Tu; Tim Menzies

arXiv:1804.10657·cs.SE·May 1, 2018·1 cites

Can You Explain That, Better? Comprehensible Text Analytics for SE Applications

Amritanshu Agrawal, Huy Tu, Tim Menzies

PDF

Open Access

TL;DR

This paper demonstrates that combining LDA with FFTs creates simple, human-readable models for classifying software bug reports, achieving comparable or better accuracy than complex methods while being faster and easier to interpret.

Contribution

It introduces a novel approach using LDA and FFTs for software bug report classification, emphasizing simplicity and interpretability over complexity.

Findings

01

LDA+FFTs models are small and human-readable.

02

LDA+FFTs achieve similar or better accuracy than complex models.

03

LDA+FFTs are faster to generate and easier to understand.

Abstract

Text mining methods are used for a wide range of Software Engineering (SE) tasks. The biggest challenge of text mining is high dimensional data, i.e., a corpus of documents can contain $1 0^{4}$ to $1 0^{6}$ unique words. To address this complexity, some very convoluted text mining methods have been applied. Is that complexity necessary? Are there simpler ways to quickly generate models that perform as well as the more convoluted methods and also be human-readable? To answer these questions, we explore a combination of LDA (Latent Dirichlet Allocation) and FFTs (Fast and Frugal Trees) to classify NASA software bug reports from six different projects. Designed using principles from psychological science, FFTs return very small models that are human-comprehensible. When compared to the commonly used text mining method and a recent state-of-the-art-system (search-based SE method that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Advanced Text Analysis Techniques · Topic Modeling