Data Race Detection Using Large Language Models

Le Chen; Xianzhong Ding; Murali Emani; Tristan Vanderbruggen; Pei-hung; Lin; Chuanhua Liao

arXiv:2308.07505·cs.LG·November 28, 2023·1 cites

Data Race Detection Using Large Language Models

Le Chen, Xianzhong Ding, Murali Emani, Tristan Vanderbruggen, Pei-hung, Lin, Chuanhua Liao

PDF

Open Access

TL;DR

This paper investigates the use of large language models for data race detection in high-performance computing, proposing a novel dataset and fine-tuning methods, showing promising results but still lagging behind traditional tools.

Contribution

Introduces DRB-ML, a new dataset for data race detection, and explores LLM-based approaches combining prompting and fine-tuning techniques.

Findings

01

LLMs can be used for data race detection.

02

LLMs outperform traditional tools in some aspects.

03

LLMs still lack detailed variable pair analysis.

Abstract

Large language models (LLMs) are demonstrating significant promise as an alternate strategy to facilitate analyses and optimizations of high-performance computing programs, circumventing the need for resource-intensive manual tool creation. In this paper, we explore a novel LLM-based data race detection approach combining prompting engineering and fine-tuning techniques. We create a dedicated dataset named DRB-ML, which is derived from DataRaceBench, with fine-grain labels showing the presence of data race pairs and their associated variables, line numbers, and read/write information. DRB-ML is then used to evaluate representative LLMs and fine-tune open-source ones. Our experiment shows that LLMs can be a viable approach to data race detection. However, they still cannot compete with traditional data race detection tools when we need detailed information about variable pairs causing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Topic Modeling · Parallel Computing and Optimization Techniques