First Logit Boosting: Visual Grounding Method to Mitigate Object Hallucination in Large Vision-Language Models

Jiwoo Ha; Jongwoo Baek; Jinhyun So

arXiv:2604.00455·cs.CV·April 2, 2026

First Logit Boosting: Visual Grounding Method to Mitigate Object Hallucination in Large Vision-Language Models

Jiwoo Ha, Jongwoo Baek, Jinhyun So

PDF

1 Repo

TL;DR

The paper introduces First Logit Boosting (FLB), a training-free method that reduces object hallucination in large vision-language models by stabilizing visual grounding during text generation.

Contribution

FLB is a simple, training-free technique that mitigates long-term decay of visual information and hallucinations in LVLMs, with minimal inference overhead.

Findings

01

FLB significantly reduces object hallucination across various tasks and models.

02

FLB maintains visual grounding throughout generation, preventing decay.

03

FLB adds negligible inference overhead, suitable for real-time systems.

Abstract

Recent Large Vision-Language Models (LVLMs) have demonstrated remarkable performance across various multimodal tasks that require understanding both visual and linguistic inputs. However, object hallucination -- the generation of nonexistent objects in answers -- remains a persistent challenge. Although several approaches such as retraining and external grounding methods have been proposed to mitigate this issue, they still suffer from high data costs or structural complexity. Training-free methods such as Contrastive Decoding (CD) are more cost-effective, avoiding additional training or external models, but still suffer from long-term decay, where visual grounding weakens and language priors dominate as the generation progresses. In this paper, we propose First Logit Boosting (FLB), a simple yet effective training-free technique designed to alleviate long-term decay in LVLMs. FLB…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jiwooha20/FLB
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.