Evaluating Efficiency and Novelty of LLM-Generated Code for Graph Analysis
Atieh Barati Nia, Mohammad Dindoost, David A. Bader

TL;DR
This study systematically evaluates how well large language models can generate efficient C code for graph analysis, focusing on performance, correctness, and potential for algorithm innovation.
Contribution
It is one of the first comprehensive assessments of LLMs in generating high-performance graph algorithms, highlighting their strengths and limitations.
Findings
Claude Sonnet 4 Extended outperforms human benchmarks in triangle counting
LLMs excel at optimizing existing algorithms for efficiency
Potential for LLMs to invent novel algorithms remains promising
Abstract
Large Language Models (LLMs) are increasingly used to automate software development, yet most prior evaluations focus on functional correctness or high-level languages such as Python. As one of the first systematic explorations of LLM-assisted software performance engineering, we present a comprehensive study of LLMs' ability to generate efficient C implementations of graph-analysis routines -- code that must satisfy stringent runtime and memory constraints. This emerging field of LLM-assisted algorithm engineering holds significant promise, as these models may possess the capability to design novel approaches that improve existing algorithms and their implementations. Eight state-of-the-art models (OpenAI ChatGPT o3 and o4-mini-high, Anthropic Claude 4 Sonnet and Sonnet Extended, Google Gemini 2.5 Flash and Pro, xAI Grok 3-Think, and DeepSeek DeepThink R1) are benchmarked using two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Graph Theory and Algorithms · Model-Driven Software Engineering Techniques
