From Prompts to Performance: Evaluating LLMs for Task-based Parallel Code Generation
Linus Bantel, Moritz Strack, Alexander Strack, Dirk Pfl\"uger

TL;DR
This paper evaluates how well large language models generate task-based parallel code across different frameworks and input prompts, revealing their strengths and limitations in high-performance computing contexts.
Contribution
It systematically assesses LLMs' ability to produce correct and scalable parallel code from various prompts and frameworks, highlighting areas for improvement.
Findings
LLMs can generate correct parallel code for simple problems
Performance varies significantly with problem complexity and framework
Certain frameworks are more amenable to LLM-generated code
Abstract
Large Language Models (LLM) show strong abilities in code generation, but their skill in creating efficient parallel programs is less studied. This paper explores how LLMs generate task-based parallel code from three kinds of input prompts: natural language problem descriptions, sequential reference implementations, and parallel pseudo code. We focus on three programming frameworks: OpenMP Tasking, C++ standard parallelism, and the asynchronous many-task runtime HPX. Each framework offers different levels of abstraction and control for task execution. We evaluate LLM-generated solutions for correctness and scalability. Our results reveal both strengths and weaknesses of LLMs with regard to problem complexity and framework. Finally, we discuss what these findings mean for future LLM-assisted development in high-performance and scientific computing.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Parallel Computing and Optimization Techniques · Software Engineering Research
