Mining developer communication data streams
Andy M. Connor, Jacqui Finlay, Russel Pears

TL;DR
This paper applies data stream mining techniques to developer communication data from a software repository to identify metrics that predict build outcomes, demonstrating the significance of a few key metrics.
Contribution
It introduces the use of Hoeffding Tree and ADWIN methods to analyze developer communication streams for predicting build success or failure.
Findings
Few communication metrics significantly predict build outcomes
Hoeffding Tree effectively classifies build results
Concept drift detection improves prediction accuracy
Abstract
This paper explores the concepts of modelling a software development project as a process that results in the creation of a continuous stream of data. In terms of the Jazz repository used in this research, one aspect of that stream of data would be developer communication. Such data can be used to create an evolving social network characterized by a range of metrics. This paper presents the application of data stream mining techniques to identify the most useful metrics for predicting build outcomes. Results are presented from applying the Hoeffding Tree classification method used in conjunction with the Adaptive Sliding Window (ADWIN) method for detecting concept drift. The results indicate that only a small number of the available metrics considered have any significance for predicting the outcome of a build.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Reliability and Analysis Research · Data Mining Algorithms and Applications
