A Low-Rank Method for Vision Language Model Hallucination Mitigation in Autonomous Driving
Keke Long, Jiacheng Guo, Tianyun Zhang, Hongkai Yu, Xiaopeng Li

TL;DR
This paper introduces a low-rank, self-contained method to detect and rank hallucination-free captions generated by vision language models in autonomous driving, improving accuracy and efficiency without external references.
Contribution
A novel low-rank decomposition approach for ranking VLM captions based solely on their internal embeddings, enhancing hallucination detection in autonomous driving.
Findings
Achieves 87% accuracy in identifying hallucination-free captions
Reduces inference time by 51-67% compared to debate methods
Strong correlation between residuals and human hallucination judgments
Abstract
Vision Language Models (VLMs) are increasingly used in autonomous driving to help understand traffic scenes, but they sometimes produce hallucinations, which are false details not grounded in the visual input. Detecting and mitigating hallucinations is challenging when ground-truth references are unavailable and model internals are inaccessible. This paper proposes a novel self-contained low-rank approach to automatically rank multiple candidate captions generated by multiple VLMs based on their hallucination levels, using only the captions themselves without requiring external references or model access. By constructing a sentence-embedding matrix and decomposing it into a low-rank consensus component and a sparse residual, we use the residual magnitude to rank captions: selecting the one with the smallest residual as the most hallucination-free. Experiments on the NuScenes dataset…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Adversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis
