Towards Causal Analysis of Empirical Software Engineering Data: The Impact of Programming Languages on Coding Competitions
Carlo A. Furia, Richard Torkar, Robert Feldt

TL;DR
This paper explores causal analysis techniques for observational software engineering data, demonstrating how causal models can improve understanding of factors like programming languages' impact on coding competition performance.
Contribution
It introduces novel structural causal model techniques to analyze observational data and applies them to assess programming languages' effects in a large coding contest.
Findings
Weak overall effect of programming languages on performance
Significant differences between causal and correlational analyses
Causal analysis provides more robust insights
Abstract
There is abundant observational data in the software engineering domain, whereas running large-scale controlled experiments is often practically impossible. Thus, most empirical studies can only report statistical correlations -- instead of potentially more insightful and robust causal relations. To support analyzing purely observational data for causal relations, and to assess any differences between purely predictive and causal models of the same data, this paper discusses some novel techniques based on structural causal models (such as directed acyclic graphs of causal Bayesian networks). Using these techniques, one can rigorously express, and partially validate, causal hypotheses; and then use the causal information to guide the construction of a statistical model that captures genuine causal relations -- such that correlation does imply causation. We apply these ideas to analyzing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Bayesian Modeling and Causal Inference · Online Learning and Analytics
