Comparing Llama-2 and GPT-3 LLMs for HPC kernels generation

Pedro Valero-Lara; Alexis Huante; Mustafa Al Lail; William F. Godoy,; Keita Teranishi; Prasanna Balaprakash; Jeffrey S. Vetter

arXiv:2309.07103·cs.SE·September 14, 2023·5 cites

Comparing Llama-2 and GPT-3 LLMs for HPC kernels generation

Pedro Valero-Lara, Alexis Huante, Mustafa Al Lail, William F. Godoy,, Keita Teranishi, Prasanna Balaprakash, Jeffrey S. Vetter

PDF

Open Access

TL;DR

This paper compares Llama-2 and GPT-3 models in generating high-performance computing kernels across various programming languages and models, revealing differences in reliability and optimization.

Contribution

It provides a comparative analysis of Llama-2 and GPT-3 for HPC kernel generation, highlighting Llama-2's competitive accuracy and optimization capabilities.

Findings

01

Llama-2 shows competitive or superior accuracy to GPT-3.

02

Copilot generates more reliable but less optimized code.

03

Llama-2 produces less reliable but more optimized code.

Abstract

We evaluate the use of the open-source Llama-2 model for generating well-known, high-performance computing kernels (e.g., AXPY, GEMV, GEMM) on different parallel programming models and languages (e.g., C++: OpenMP, OpenMP Offload, OpenACC, CUDA, HIP; Fortran: OpenMP, OpenMP Offload, OpenACC; Python: numpy, Numba, pyCUDA, cuPy; and Julia: Threads, CUDA.jl, AMDGPU.jl). We built upon our previous work that is based on the OpenAI Codex, which is a descendant of GPT-3, to generate similar kernels with simple prompts via GitHub Copilot. Our goal is to compare the accuracy of Llama-2 and our original GPT-3 baseline by using a similar metric. Llama-2 has a simplified model that shows competitive or even superior accuracy. We also report on the differences between these foundational large language models as generative AI continues to redefine human-computer interactions. Overall, Copilot…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Topic Modeling

MethodsMulti-Head Attention · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Cosine Annealing · Linear Layer · Softmax · {Dispute@FaQ-s}How to file a dispute with Expedia? · Linear Warmup With Cosine Annealing · Dense Connections · Layer Normalization