ParEval-Repo: A Benchmark Suite for Evaluating LLMs with Repository-level HPC Translation Tasks

Joshua H. Davis; Daniel Nichols; Ishan Khillan; Abhinav Bhatele

arXiv:2506.20938·cs.DC·September 8, 2025

ParEval-Repo: A Benchmark Suite for Evaluating LLMs with Repository-level HPC Translation Tasks

Joshua H. Davis, Daniel Nichols, Ishan Khillan, Abhinav Bhatele

PDF

Open Access

TL;DR

This paper introduces ParEval-Repo, a benchmark suite for evaluating large language models' ability to automatically translate entire GPGPU codebases across different execution models, highlighting current capabilities and challenges.

Contribution

The paper presents a novel benchmark and testing framework, ParEval-Repo, for assessing LLMs in translating complex, repository-level scientific computing codebases across GPGPU programming models.

Findings

01

LLMs can effectively translate small scientific programs.

02

Scaling to larger codebases remains challenging due to build system and dependency issues.

03

Translation cost correlates with code complexity.

Abstract

GPGPU architectures have become significantly more diverse in recent years, which has led to an emergence of a variety of specialized programming models and software stacks to support them. Portable programming models exist, but they require significant developer effort to port to and optimize for different hardware architectures. Large language models (LLMs) may help to reduce this programmer burden. In this paper, we present a novel benchmark and testing framework, ParEval-Repo, which can be used to evaluate the efficacy of LLM-based approaches in automatically translating entire codebases across GPGPU execution models. ParEval-Repo includes several scientific computing and AI mini-applications in a range of programming models and levels of repository complexity. We use ParEval-Repo to evaluate a range of state-of-the-art open-source and commercial LLMs, with both a non-agentic and a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Research Data Management Practices · Scientific Computing and Data Management