CI-Repair-Bench: A Repository-Aware Benchmark for Automated Patch Validation via CI Workflows

Rabeya Khatun Muna; Md Nakhla Rafi; and Tse-Hsun (Peter) Chen

arXiv:2604.27148·cs.SE·May 6, 2026

CI-Repair-Bench: A Repository-Aware Benchmark for Automated Patch Validation via CI Workflows

Rabeya Khatun Muna, Md Nakhla Rafi, and Tse-Hsun (Peter) Chen

PDF

TL;DR

CI-Repair-Bench is a realistic benchmark derived from real GitHub Actions workflows, designed to evaluate automated program repair methods at the repository level, considering diverse CI failure types.

Contribution

It introduces a new benchmark with 567 real CI failures, categorized into 12 error types, and evaluates repair correctness through full CI re-execution, reflecting real-world scenarios.

Findings

01

Automated repair is most effective for formatting and linting failures.

02

Environment and dependency failures remain challenging for repair methods.

03

The best-performing LLM achieved an 18.9% success rate.

Abstract

Continuous Integration (CI) enforces repository-level correctness through multi-stage workflows and is central to modern software development, yet diagnosing and repairing CI failures remains challenging. Unlike traditional program repair, CI failures frequently involve non-code artifacts, environment and dependency issues, noisy execution logs, and workflow-level constraints. Existing program repair benchmarks fall short in this setting: they are largely test-centric, restrict repairs to source code, assume fixed execution environments, and evaluate under simplified CI workflows that do not reflect real repository-level validation. We introduce CI-Repair-Bench, a benchmark for CI-verified, repository-level program repair constructed from real GitHub Actions executions. It contains 567 CI failure instances from 103 repositories and evaluates repair correctness exclusively through full…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.