A Low-Rank Method for Vision Language Model Hallucination Mitigation in Autonomous Driving

Keke Long; Jiacheng Guo; Tianyun Zhang; Hongkai Yu; Xiaopeng Li

arXiv:2511.06496·cs.RO·November 11, 2025

A Low-Rank Method for Vision Language Model Hallucination Mitigation in Autonomous Driving

Keke Long, Jiacheng Guo, Tianyun Zhang, Hongkai Yu, Xiaopeng Li

PDF

Open Access

TL;DR

This paper introduces a low-rank, self-contained method to detect and rank hallucination-free captions generated by vision language models in autonomous driving, improving accuracy and efficiency without external references.

Contribution

A novel low-rank decomposition approach for ranking VLM captions based solely on their internal embeddings, enhancing hallucination detection in autonomous driving.

Findings

01

Achieves 87% accuracy in identifying hallucination-free captions

02

Reduces inference time by 51-67% compared to debate methods

03

Strong correlation between residuals and human hallucination judgments

Abstract

Vision Language Models (VLMs) are increasingly used in autonomous driving to help understand traffic scenes, but they sometimes produce hallucinations, which are false details not grounded in the visual input. Detecting and mitigating hallucinations is challenging when ground-truth references are unavailable and model internals are inaccessible. This paper proposes a novel self-contained low-rank approach to automatically rank multiple candidate captions generated by multiple VLMs based on their hallucination levels, using only the captions themselves without requiring external references or model access. By constructing a sentence-embedding matrix and decomposing it into a low-rank consensus component and a sparse residual, we use the residual magnitude to rank captions: selecting the one with the smallest residual as the most hallucination-free. Experiments on the NuScenes dataset…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Adversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis