Improving MPI Error Detection and Repair with Large Language Models and Bug References

Scott Piersall; Yang Gao; Shenyang Liu; Liqiang Wang

arXiv:2604.02398·cs.SE·April 6, 2026

Improving MPI Error Detection and Repair with Large Language Models and Bug References

Scott Piersall, Yang Gao, Shenyang Liu, Liqiang Wang

PDF

TL;DR

This paper enhances large language models for MPI error detection and repair by integrating Few-Shot Learning, Chain-of-Thought reasoning, and Retrieval Augmented Generation, significantly improving accuracy over baseline models.

Contribution

The paper introduces a novel bug detection and repair approach using advanced LLM techniques, achieving substantial accuracy improvements in MPI error handling.

Findings

01

Error detection accuracy improved from 44% to 77%.

02

Bug referencing technique generalizes well to other LLMs.

03

Enhanced methods outperform direct ChatGPT application.

Abstract

Message Passing Interface (MPI) is a foundational technology in high-performance computing (HPC), widely used for large-scale simulations and distributed training (e.g., in machine learning frameworks such as PyTorch and TensorFlow). However, maintaining MPI programs remains challenging due to their complex interplay among processes and the intricacies of message passing and synchronization. With the advancement of large language models like ChatGPT, it is tempting to adopt such technology for automated error detection and repair. Yet, our studies reveal that directly applying large language models (LLMs) yields suboptimal results, largely because these models lack essential knowledge about correct and incorrect usage, particularly the bugs found in MPI programs. In this paper, we design a bug detection and repair technique alongside Few-Shot Learning (FSL), Chain-of-Thought (CoT)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.