Can You Explain That, Better? Comprehensible Text Analytics for SE Applications
Amritanshu Agrawal, Huy Tu, Tim Menzies

TL;DR
This paper demonstrates that combining LDA with FFTs creates simple, human-readable models for classifying software bug reports, achieving comparable or better accuracy than complex methods while being faster and easier to interpret.
Contribution
It introduces a novel approach using LDA and FFTs for software bug report classification, emphasizing simplicity and interpretability over complexity.
Findings
LDA+FFTs models are small and human-readable.
LDA+FFTs achieve similar or better accuracy than complex models.
LDA+FFTs are faster to generate and easier to understand.
Abstract
Text mining methods are used for a wide range of Software Engineering (SE) tasks. The biggest challenge of text mining is high dimensional data, i.e., a corpus of documents can contain to unique words. To address this complexity, some very convoluted text mining methods have been applied. Is that complexity necessary? Are there simpler ways to quickly generate models that perform as well as the more convoluted methods and also be human-readable? To answer these questions, we explore a combination of LDA (Latent Dirichlet Allocation) and FFTs (Fast and Frugal Trees) to classify NASA software bug reports from six different projects. Designed using principles from psychological science, FFTs return very small models that are human-comprehensible. When compared to the commonly used text mining method and a recent state-of-the-art-system (search-based SE method that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Advanced Text Analysis Techniques · Topic Modeling
