Comparative Analysis of the Code Generated by Popular Large Language Models (LLMs) for MISRA C++ Compliance
Malik Muhammad Umer

TL;DR
This study compares how well popular large language models generate C++ code that complies with MISRA standards for safety-critical systems, revealing current limitations and potential improvements.
Contribution
It provides a systematic comparison of LLMs' ability to produce MISRA-compliant C++ code and highlights the need for further refinement for safety-critical applications.
Findings
None of the LLMs fully comply with MISRA C++ standards.
DeepSeek had the fewest violations among models.
ChatGPT can identify and fix violations more effectively.
Abstract
Safety-critical systems are engineered systems whose failure or malfunction could result in catastrophic consequences. The software development for safety-critical systems necessitates rigorous engineering practices and adherence to certification standards like DO-178C for avionics. DO-178C is a guidance document which requires compliance to well-defined software coding standards like MISRA C++ to enforce coding guidelines that prevent the use of ambiguous, unsafe, or undefined constructs. Large Language Models (LLMs) have demonstrated significant capabilities in automatic code generation across a wide range of programming languages, including C++. Despite their impressive performance, code generated by LLMs in safety-critical domains must be carefully analyzed for conformance to MISRA C++ coding standards. In this paper, I have conducted a comparative analysis of the C++ code generated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
