Small Language Models as Compiler Experts: Auto-Parallelization for Heterogeneous Systems
Prathamesh Devadiga

TL;DR
This paper demonstrates that small language models can effectively serve as compiler experts for auto-parallelization, achieving significant speedups on heterogeneous systems across various real-world kernels.
Contribution
It introduces a novel approach using small language models for auto-parallelization, outperforming traditional compiler heuristics on heterogeneous hardware.
Findings
Average speedup of 6.81x across benchmarks
Peak performance of 43.25x on convolution operations
Robustness verified across multiple hardware platforms
Abstract
Traditional auto-parallelizing compilers, reliant on rigid heuristics, struggle with the complexity of modern heterogeneous systems. This paper presents a comprehensive evaluation of small (approximately 1B parameter) language-model-driven compiler auto-parallelization. We evaluate three models: gemma3, llama3.2, and qwen2.5, using six reasoning strategies across 11 real-world kernels drawn from scientific computing, graph algorithms, and machine learning. Our system is benchmarked against strong compiler baselines, including LLVM Polly, TVM, and Triton. Across 376 total evaluations, the proposed approach achieves an average speedup of 6.81x and a peak performance of 43.25x on convolution operations. We analyze scalability, verify correctness using multiple sanitizers, and confirm robustness across diverse compilers and hardware platforms. Our results demonstrate that small, efficient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Big Data and Digital Economy · Natural Language Processing Techniques
