# Mining Treatment-Outcome Constructs from Sequential Software Engineering   Data

**Authors:** Maleknaz Nayebi, Guenther Ruhe, Thomas Zimmermann

arXiv: 1901.05604 · 2019-01-18

## TL;DR

This paper introduces the Gandhi-Washington Method (GWM), an analytical approach that automatically mines treatment-outcome constructs from sequential software engineering data, revealing how recurring event sequences impact project outcomes.

## Contribution

The paper presents GWM, a novel method that uses regular expressions and statistical tests to automatically identify significant treatment-outcome constructs in sequential software engineering data.

## Key findings

- GWM effectively identifies treatment-outcome constructs in software project data.
- The method reveals significant impacts of event sequences on project outcomes.
- GWM is applicable to various software engineering datasets like file editing and release cycles.

## Abstract

Many investigations in empirical software engineering look at sequences of data resulting from development or management processes. In this paper, we propose an analytical approach called the Gandhi-Washington Method (GWM) to investigate the impact of recurring events in software projects. GWM takes an encoding of events and activities provided by a software analyst as input. It uses regular expressions to automatically condense and summarize information and infer treatments. Relating the treatments to the outcome through statistical tests, treatment-outcome constructs are automatically mined from the data. The output of GWM is a set of treatment-outcome constructs. Each treatment in the set of mined constructs is significantly different from the other treatments considering the impact on the outcome and/or is structurally different from other treatments considering the sequence of events. We describe GWM and classes of problems to which GWM can be applied. We demonstrate the applicability of this method for empirical studies on sequences of file editing, code ownership, and release cycle time.

---
Source: https://tomesphere.com/paper/1901.05604