Explaining GitHub Actions Failures with Large Language Models: Challenges, Insights, and Limitations
Pablo Valenzuela-Toledo, Chuyue Wu, Sandro Hernandez, Alexander Boll,, Roman Machacek, Sebastiano Panichella, Timo Kehrer

TL;DR
This paper investigates the use of large language models to explain GitHub Actions failures, highlighting their potential to assist developers in understanding errors and identifying current limitations in complex scenarios.
Contribution
It provides an empirical evaluation of LLMs for explaining CI/CD failures, revealing strengths in simple logs and challenges in complex reasoning tasks.
Findings
Over 80% of developers rated LLM explanations positively for simple logs
LLMs can help reduce manual failure analysis in CI/CD workflows
Limitations exist in reasoning for complex failure scenarios
Abstract
GitHub Actions (GA) has become the de facto tool that developers use to automate software workflows, seamlessly building, testing, and deploying code. Yet when GA fails, it disrupts development, causing delays and driving up costs. Diagnosing failures becomes especially challenging because error logs are often long, complex and unstructured. Given these difficulties, this study explores the potential of large language models (LLMs) to generate correct, clear, concise, and actionable contextual descriptions (or summaries) for GA failures, focusing on developers' perceptions of their feasibility and usefulness. Our results show that over 80\% of developers rated LLM explanations positively in terms of correctness for simpler/small logs. Overall, our findings suggest that LLMs can feasibly assist developers in understanding common GA errors, thus, potentially reducing manual analysis.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Software Engineering Research · Advanced Software Engineering Methodologies
