Hallucination in LLM-Based Code Generation: An Automotive Case Study
Marc Pavel, Nenad Petrovic, Lukasz Mazur, Vahid Zolfaghari, Fengjunjie Pan, Alois Knoll

TL;DR
This study examines hallucination issues in LLM-based automotive code generation, revealing high error rates and the importance of context-rich prompts for correct outputs, emphasizing safety-critical application challenges.
Contribution
It provides a detailed case study on hallucinations in automotive code generation with LLMs, highlighting the impact of prompt complexity on output correctness.
Findings
High frequency of syntax and reference errors in models
Context-rich prompts improve correctness in GPT-4.1 and GPT-4o
Simpler prompts often fail to produce valid code
Abstract
Large Language Models (LLMs) have shown significant potential in automating code generation tasks offering new opportunities across software engineering domains. However, their practical application remains limited due to hallucinations - outputs that appear plausible but are factually incorrect, unverifiable or nonsensical. This paper investigates hallucination phenomena in the context of code generation with a specific focus on the automotive domain. A case study is presented that evaluates multiple code LLMs for three different prompting complexities ranging from a minimal one-liner prompt to a prompt with Covesa Vehicle Signal Specifications (VSS) as additional context and finally to a prompt with an additional code skeleton. The evaluation reveals a high frequency of syntax violations, invalid reference errors and API knowledge conflicts in state-of-the-art models GPT-4.1, Codex…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
