Heimdallr: Characterizing and Detecting LLM-Induced Security Risks in GitHub CI Workflows
Bonan Ruan, Yeqi Fu, Chuqi Zhang, Jiahao Liu, Jun Zeng, Zhenkai Liang

TL;DR
This paper studies security risks introduced by integrating Large Language Models into GitHub CI workflows, characterizes potential threats, and presents Heimdallr, a framework for detecting such risks with high accuracy.
Contribution
It provides the first comprehensive analysis of LLM-induced security risks in CI workflows and introduces Heimdallr, a novel hybrid analysis framework for risk detection.
Findings
Heimdallr achieves F1~=~0.994 in LLM-node identification.
Triggerability classification accuracy is 99.8%.
Detected and disclosed 802 vulnerable workflow instances.
Abstract
GitHub Continuous Integration (CI) workflows increasingly integrate Large Language Models (LLMs) to automate review, triage, content generation, and repository maintenance. This creates a new attack surface: externally controllable workflow inputs can shape LLM prompts and outputs, which may in turn affect security decisions, repository state, or privileged execution. Although LLM security and CI security have each been studied extensively, their intersection remains underexplored. In this paper, we present the first study of LLM-induced security risks in GitHub CI workflows. We characterize the problem along the full execution chain and develop a taxonomy of high-level risk classes and concrete threat vectors. To detect such risks in practice, we design Heimdallr, a hybrid analysis framework that normalizes workflows into an LLM-Workflow Property Graph (L-WPG) and combines…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
