Performance Evaluation of General Purpose Large Language Models for Basic Linear Algebra Subprograms Code Generation
Daichi Mukunoki, Shun-ichiro Hayashi, Tetsuya Hoshino, Takahiro Katagiri

TL;DR
This paper evaluates how well large language models like GPT-4.1 and o4-mini can generate CPU code for basic linear algebra routines, showing they can produce correct and optimized code with minimal input.
Contribution
It demonstrates that general-purpose LLMs can generate correct and optimized BLAS routines with minimal prompts, highlighting their potential in scientific computing.
Findings
Correct code generated in many cases from routine names alone
Basic optimizations like threading and vectorization are partially implemented
Generated code can outperform reference implementations in speed
Abstract
Generative AI technology based on Large Language Models (LLM) has been developed and applied to assist or automatically generate program codes. In this paper, we evaluate the capability of existing general LLMs for Basic Linear Algebra Subprograms (BLAS) code generation for CPUs. We use two LLMs provided by OpenAI: GPT-4.1, a Generative Pre-trained Transformer (GPT) model, and o4-mini, one of the o-series of Reasoning models. Both have been released in April 2025. For the routines from level-1 to 3 BLAS, we tried to generate (1) C code without optimization from routine name only, (2) C code with basic performance optimizations (thread parallelization, SIMD vectorization, and cache blocking) from routine name only, and (3) C code with basic performance optimizations based on Fortran reference code. As a result, we found that correct code can be generated in many cases even when only…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
