Conclusion Stability for Natural Language Based Mining of Design   Discussions

Alvi Mahadi; Neil A. Ernst; Karan Tongay

arXiv:2106.09844·cs.SE·June 21, 2021

Conclusion Stability for Natural Language Based Mining of Design Discussions

Alvi Mahadi, Neil A. Ernst, Karan Tongay

PDF

Open Access

TL;DR

This paper explores how to improve the stability and relevance of machine learning models in identifying design-related discussions in developer artifacts across different projects by using augmentation and context-specific techniques.

Contribution

It introduces two techniques, augmentation and context specificity, that significantly enhance the stability and cross-project applicability of design mining models.

Findings

01

Achieved AUC of 0.88 on within-dataset classification.

02

Achieved AUC of 0.80 on cross-dataset classification.

03

Demonstrated poor conclusion stability across artifact types and projects without enhancements.

Abstract

Developer discussions range from in-person hallway chats to comment chains on bug reports. Being able to identify discussions that touch on software design would be helpful in documentation and refactoring software. Design mining is the application of machine learning techniques to correctly label a given discussion artifact, such as a pull request, as pertaining (or not) to design. In this paper we demonstrate a simple example of how design mining works. We then show how conclusion stability is poor on different artifact types and different projects. We show two techniques -- augmentation and context specificity -- that greatly improve the conclusion stability and cross-project relevance of design mining. Our new approach achieves AUC of 0.88 on within dataset classification and 0.80 on the cross-dataset classification task.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Engineering Techniques and Practices · Advanced Software Engineering Methodologies