Understanding and Detecting Flaky Builds in GitHub Actions

Wenhao Ge; Chen Zhang

arXiv:2602.02307·cs.SE·February 3, 2026

Understanding and Detecting Flaky Builds in GitHub Actions

Wenhao Ge, Chen Zhang

PDF

Open Access

TL;DR

This paper conducts a large-scale empirical study of flaky builds in GitHub Actions, identifying common failure causes and proposing a machine learning method that significantly improves flaky build detection accuracy.

Contribution

It provides the first extensive analysis of flaky builds in GitHub Actions and introduces a novel ML-based detection approach with enhanced performance.

Findings

01

3.2% of builds are rerun, with 67.73% being flaky

02

Identified 15 categories of flaky failures, with tests, network, and dependencies most common

03

ML approach improves F1-score by up to 20.3% over baseline

Abstract

Continuous Integration (CI) is widely used to provide rapid feedback on code changes; however, CI build outcomes are not always reliable. Builds may fail intermittently due to non-deterministic factors, leading to flaky builds that undermine developers' trust in CI, waste computational resources, and threaten the validity of CI-related empirical studies. In this paper, we present a large-scale empirical study of flaky builds in GitHub Actions based on rerun data from 1,960 open-source Java projects. Our results show that 3.2% of builds are rerun, and 67.73% of these rerun builds exhibit flaky behavior, affecting 1,055 (51.28%) of the projects. Through an in-depth failure analysis, we identify 15 distinct categories of flaky failures, among which flaky tests, network issues, and dependency resolution issues are the most prevalent. Building on these findings, we propose a machine…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Software Engineering Techniques and Practices