Where did we fail? -- Reproducing build failures in embedded open source software
Han Fu, Andreas Ermedahl, Sigrid Eldh, Kristian Wiklund, Philipp Haller, Cyrille Artho

TL;DR
This paper introduces PhantomRun, a system that standardizes and replays CI build logs for embedded open source software, enabling reproducibility and analysis of build failures across large datasets.
Contribution
PhantomRun provides a unified framework and dataset for retrieving, storing, and reproducing CI build logs, facilitating large-scale failure analysis in embedded systems.
Findings
Reconstructed 91.8% of 4628 failing CI builds.
Preserved execution outcomes in 98% of cases.
Reproduced builds closely match original logs, with minor nondeterministic differences.
Abstract
Due to hardware-software co-development in embedded systems, continuous integration (CI) builds frequently fail because of complex cross-compilation, board configurations, and toolchain constraints. Although CI build logs contain valuable diagnostic information, they are short-lived and difficult to reuse due to heterogeneous runners, toolchains, and log formats. To address these challenges, we present PhantomRun, a unified abstraction layer and publicly reusable dataset that standardizes the retrieval, storage, and reproduction of CI build logs and metadata. Across 4628 failing CI runs, we reconstructed 91.8% of builds and preserved execution outcomes in 98% of evaluated cases. PhantomRun provides two core capabilities: retrieving the build log of any commit and faithfully re-executing the corresponding build in a controlled environment. By exposing all build artifacts and metadata…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
