Compiling Code LLMs into Lightweight Executables

Jieke Shi; Junda He; Zhou Yang; Chengran Yang; Mykhailo Klymenko; Thong Hoang (James); Xiwei Xu (Sherry); Zhenchang Xing; and David Lo

arXiv:2603.29813·cs.SE·April 27, 2026

Compiling Code LLMs into Lightweight Executables

Jieke Shi, Junda He, Zhou Yang, Chengran Yang, Mykhailo Klymenko, Thong Hoang (James), Xiwei Xu (Sherry), Zhenchang Xing, and David Lo

PDF

TL;DR

Ditto is a framework that compresses and compiles large language models into lightweight executables, enabling efficient local deployment on commodity hardware with minimal accuracy loss.

Contribution

It introduces a novel combination of quantization and LLVM-based compilation to optimize Code LLMs for local execution on resource-constrained devices.

Findings

01

Achieves up to 10.5× faster inference

02

Reduces memory usage by 6.4×

03

Lowers energy consumption by 10.5×

Abstract

The demand for better prediction accuracy and higher execution performance in neural networks continues to grow. The emergence and success of Large Language Models (LLMs) have produced many cloud-based tools for software engineering tasks such as code suggestion. Although effective, cloud deployment raises concerns over privacy, latency, and reliance on network connectivity. Running LLMs locally on personal devices such as laptops would address these issues, because it enables offline use and reduces response time. However, local deployment is challenging, since commodity devices lack high-performance accelerators such as GPUs and are constrained by limited memory and compute capacity, which makes it hard to execute large models efficiently. We present Ditto, a framework that optimizes both the model size of Code LLMs and the inference programs that execute them. Our approach…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.