Agentic Harness for Real-World Compilers

Yingwei Zheng; Cong Li; Shaohua Li; Yuqun Zhang; Zhendong Su

arXiv:2603.20075·cs.SE·March 23, 2026

Agentic Harness for Real-World Compilers

Yingwei Zheng, Cong Li, Shaohua Li, Yuqun Zhang, Zhendong Su

PDF

Open Access

TL;DR

This paper introduces llvm-autofix, an agentic framework designed to improve large language models' ability to understand and fix LLVM compiler bugs, addressing the unique challenges of compiler bug repair.

Contribution

We present llvm-autofix, the first specialized harness for LLMs targeting compiler bugs, including tools, benchmarks, and a minimal agent that outperforms existing methods.

Findings

01

60% performance decline of frontier models on compiler bugs

02

llvm-autofix-mini outperforms state-of-the-art by 22%

03

Establishes a foundation for LLMs in complex compiler systems

Abstract

Compilers are critical to modern computing, yet fixing compiler bugs is difficult. While recent large language model (LLM) advancements enable automated bug repair, compiler bugs pose unique challenges due to their complexity, deep cross-domain expertise requirements, and sparse, non-descriptive bug reports, necessitating compiler-specific tools. To bridge the gap, we introduce llvm-autofix, the first agentic harness designed to assist LLM agents in understanding and fixing compiler bugs. Our focus is on LLVM, one of the most widely used compiler infrastructures. Central to llvm-autofix are agent-friendly LLVM tools, a benchmark llvm-bench of reproducible LLVM bugs, and a tailored minimal agent llvm-autofix-mini for fixing LLVM bugs. Our evaluation demonstrates a performance decline of 60% in frontier models when tackling compiler bugs compared with common software bugs. Our minimal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Testing and Debugging Techniques · Logic, programming, and type systems · Software System Performance and Reliability