Crossover Designs in Software Engineering Experiments: Review of the State of Analysis
Julian Frattini, Davide Fucci, Sira Vegas

TL;DR
This paper reviews the analysis of crossover design experiments in Software Engineering, highlighting improvements and ongoing challenges in addressing validity threats despite existing guidelines.
Contribution
It evaluates the adherence to analysis guidelines in SE crossover experiments from 2015 to 2024, revealing partial compliance and areas needing improvement.
Findings
Validity of data analyses has improved over time.
Only 29.5% of threats to validity are properly addressed.
Carryover effects are rarely modeled, at about 3%.
Abstract
Experimentation is an essential method for causal inference in any empirical discipline. Crossover-design experiments are common in Software Engineering (SE) research. In these, subjects apply more than one treatment in different orders. This design increases the amount of obtained data and deals with subject variability but introduces threats to internal validity like the learning and carryover effect. Vegas et al. reviewed the state of practice for crossover designs in SE research and provided guidelines on how to address its threats during data analysis while still harnessing its benefits. In this paper, we reflect on the impact of these guidelines and review the state of analysis of crossover design experiments in SE publications between 2015 and March 2024. To this end, by conducting a forward snowballing of the guidelines, we survey 136 publications reporting 67 crossover-design…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Techniques and Practices · Software Engineering Research · Software Reliability and Analysis Research
