REASONING COMPILER: LLM-Guided Optimizations for Efficient Model Serving

Annabelle Sujun Tang; Christopher Priebe; Rohan Mahapatra; Lianhui Qin; Hadi Esmaeilzadeh

arXiv:2506.01374·cs.LG·February 5, 2026

REASONING COMPILER: LLM-Guided Optimizations for Efficient Model Serving

Annabelle Sujun Tang, Christopher Priebe, Rohan Mahapatra, Lianhui Qin, Hadi Esmaeilzadeh

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces REASONING COMPILER, a novel framework that uses large language models and structured Monte Carlo tree search to improve compiler optimizations for neural model serving, achieving faster results with fewer samples.

Contribution

It presents a new LLM-guided, context-aware compiler optimization method that enhances sample efficiency without retraining, outperforming existing neural compiler techniques.

Findings

01

Achieves significant speedups with fewer samples.

02

Leverages LLM reasoning for context-aware optimization.

03

Outperforms existing neural compiler methods.

Abstract

While model serving has unlocked unprecedented capabilities, the high cost of serving large-scale models continues to be a significant barrier to widespread accessibility and rapid innovation. Compiler optimizations have long driven substantial performance improvements, but existing compilers struggle with neural workloads due to the exponentially large and highly interdependent space of possible transformations. Although existing stochastic search techniques can be effective, they are often sample-inefficient and fail to leverage the structural context underlying compilation decisions. We set out to investigate the research question of whether reasoning with large language models (LLMs), without any retraining, can leverage the context-aware decision space of compiler optimizations to significantly improve sample efficiency. To that end, we introduce a novel compilation framework…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Anna-Bele/LLM_MCTS_Search
none

Videos

REASONING COMPILER: LLM-Guided Optimizations for Efficient Model Serving· slideslive

Taxonomy

TopicsModel-Driven Software Engineering Techniques · Software Testing and Debugging Techniques · Distributed and Parallel Computing Systems

MethodsSparse Evolutionary Training