Hallucination Elimination and Semantic Enhancement Framework for   Vision-Language Models in Traffic Scenarios

Jiaqi Fan; Jianhua Wu; Hongqing Chu; Quanbo Ge; Bingzhao Gao

arXiv:2412.07518·cs.CV·December 11, 2024

Hallucination Elimination and Semantic Enhancement Framework for Vision-Language Models in Traffic Scenarios

Jiaqi Fan, Jianhua Wu, Hongqing Chu, Quanbo Ge, Bingzhao Gao

PDF

Open Access 1 Repo

TL;DR

This paper introduces HCOENet, a chain-of-thought correction framework that reduces hallucinations and enhances semantic understanding in vision-language models for traffic scenarios, improving accuracy and reliability.

Contribution

HCOENet is a novel plug-and-play correction method that filters hallucinated entities and extracts critical objects, advancing multimodal understanding in autonomous driving contexts.

Findings

01

HCOENet improves F1-score by 12.58% on Mini-InternVL-4B.

02

HCOENet enhances description quality with reduced computational costs.

03

Created new datasets CODA_desc and nuScenes_desc for traffic scene understanding.

Abstract

Large vision-language models (LVLMs) have demonstrated remarkable capabilities in multimodal understanding and generation tasks. However, these models occasionally generate hallucinatory texts, resulting in descriptions that seem reasonable but do not correspond to the image. This phenomenon can lead to wrong driving decisions of the autonomous driving system. To address this challenge, this paper proposes HCOENet, a plug-and-play chain-of-thought correction method designed to eliminate object hallucinations and generate enhanced descriptions for critical objects overlooked in the initial response. Specifically, HCOENet employs a cross-checking mechanism to filter entities and directly extracts critical objects from the given image, enriching the descriptive text. Experimental results on the POPE benchmark demonstrate that HCOENet improves the F1-score of the Mini-InternVL-4B and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fjq-tongji/hcoenet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Visualization and Analytics