Assessing Large Language Models in Comprehending and Verifying Concurrent Programs across Memory Models

Ridhi Jain; Rahul Purandare

arXiv:2501.14326·cs.SE·September 5, 2025·2 cites

Assessing Large Language Models in Comprehending and Verifying Concurrent Programs across Memory Models

Ridhi Jain, Rahul Purandare

PDF

Open Access

TL;DR

This paper evaluates large language models' ability to understand and verify concurrency issues in programs across different memory models, highlighting their strengths and limitations in handling complex relaxed memory scenarios.

Contribution

It provides a comprehensive assessment of leading LLMs' performance in analyzing concurrent programs under various memory models, including relaxed ones, which is a novel focus.

Findings

01

GPT-4 and similar models effectively identify concurrency issues under sequential consistency.

02

All models struggle to verify correctness under relaxed memory models.

03

Models show limitations in understanding memory ordering constraints in complex scenarios.

Abstract

As concurrent programming becomes increasingly prevalent, effectively identifying and addressing concurrency issues such as data races and deadlocks is critical. This study evaluates the performance of several leading large language models (LLMs), including GPT-3.5-turbo, GPT-4, GPT-4o, GPT-4o-mini, and Mistral-AI's Large2, in understanding and analyzing concurrency issues within software programs. Given that relaxed memory models, such as Total Store Order (TSO) and Partial Store Order (PSO), are widely implemented and adapted in modern systems, supported even by commodity architectures like ARM and x86, our evaluation focuses not only on sequentially consistent memory models but also on these relaxed memory models. Specifically, we assess two main aspects: the models' capacity to detect concurrency problems under a sequentially consistent memory model and their ability to verify the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Software Testing and Debugging Techniques · Software Engineering Research