Data Race Detection Using Large Language Models
Le Chen, Xianzhong Ding, Murali Emani, Tristan Vanderbruggen, Pei-hung, Lin, Chuanhua Liao

TL;DR
This paper investigates the use of large language models for data race detection in high-performance computing, proposing a novel dataset and fine-tuning methods, showing promising results but still lagging behind traditional tools.
Contribution
Introduces DRB-ML, a new dataset for data race detection, and explores LLM-based approaches combining prompting and fine-tuning techniques.
Findings
LLMs can be used for data race detection.
LLMs outperform traditional tools in some aspects.
LLMs still lack detailed variable pair analysis.
Abstract
Large language models (LLMs) are demonstrating significant promise as an alternate strategy to facilitate analyses and optimizations of high-performance computing programs, circumventing the need for resource-intensive manual tool creation. In this paper, we explore a novel LLM-based data race detection approach combining prompting engineering and fine-tuning techniques. We create a dedicated dataset named DRB-ML, which is derived from DataRaceBench, with fine-grain labels showing the presence of data race pairs and their associated variables, line numbers, and read/write information. DRB-ML is then used to evaluate representative LLMs and fine-tune open-source ones. Our experiment shows that LLMs can be a viable approach to data race detection. However, they still cannot compete with traditional data race detection tools when we need detailed information about variable pairs causing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Topic Modeling · Parallel Computing and Optimization Techniques
