LLMs Can Check Their Own Results to Mitigate Hallucinations in Traffic Understanding Tasks
Malsha Ashani Mahawatta Dona, Beatriz Cabrero-Daniel, Yinan Yu,, Christian Berger

TL;DR
This paper evaluates the effectiveness of SelfCheckGPT in detecting and filtering hallucinations in traffic-related image captions generated by state-of-the-art LLMs, demonstrating its potential to improve reliability in automotive perception tasks.
Contribution
The study systematically assesses SelfCheckGPT's ability to identify hallucinations in LLM-generated traffic image captions across multiple datasets and models, highlighting its practical utility.
Findings
GPT-4o outperforms LLaVA in generating faithful captions.
Dataset type does not significantly impact caption quality or hallucination detection.
Models perform better on daytime images than on dawn, dusk, or night images.
Abstract
Today's Large Language Models (LLMs) have showcased exemplary capabilities, ranging from simple text generation to advanced image processing. Such models are currently being explored for in-vehicle services such as supporting perception tasks in Advanced Driver Assistance Systems (ADAS) or Autonomous Driving (AD) systems, given the LLMs' capabilities to process multi-modal data. However, LLMs often generate nonsensical or unfaithful information, known as ``hallucinations'': a notable issue that needs to be mitigated. In this paper, we systematically explore the adoption of SelfCheckGPT to spot hallucinations by three state-of-the-art LLMs (GPT-4o, LLaVA, and Llama3) when analysing visual automotive data from two sources: Waymo Open Dataset, from the US, and PREPER CITY dataset, from Sweden. Our results show that GPT-4o is better at generating faithful image captions than LLaVA, whereas…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEEG and Brain-Computer Interfaces · ECG Monitoring and Analysis
