SECOND: Mitigating Perceptual Hallucination in Vision-Language Models via Selective and Contrastive Decoding

Woohyeon Park; Woojin Kim; Jaeik Kim; Jaeyoung Do

arXiv:2506.08391·cs.CV·March 31, 2026

SECOND: Mitigating Perceptual Hallucination in Vision-Language Models via Selective and Contrastive Decoding

Woohyeon Park, Woojin Kim, Jaeik Kim, Jaeyoung Do

PDF

1 Video

TL;DR

SECOND introduces a novel decoding approach for vision-language models that reduces object hallucination by selectively and contrastively integrating multi-scale visual information, aligning more closely with human perception.

Contribution

It presents a new method, SECOND, that leverages multi-scale visual information with an object-centric approach to mitigate hallucinations in VLMs.

Findings

01

SECOND significantly reduces perceptual hallucinations.

02

It outperforms existing benchmarks in visual understanding tasks.

03

Prioritizing and contrasting across scales enhances VLM performance.

Abstract

Despite significant advancements in Vision-Language Models (VLMs), the performance of existing VLMs remains hindered by object hallucination, a critical challenge to achieving accurate visual understanding. To address this issue, we propose SECOND: Selective and Contrastive Decoding, a novel approach that enables VLMs to effectively leverage multi-scale visual information with an object-centric manner, closely aligning with human visual perception. SECOND progressively selects and integrates multi-scale visual information, facilitating a more precise interpretation of images. By contrasting these visual information iteratively, SECOND significantly reduces perceptual hallucinations and outperforms a wide range of benchmarks. Our theoretical analysis and experiments highlight the largely unexplored potential of multi-scale application in VLMs, showing that prioritizing and contrasting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

SECOND: Mitigating Perceptual Hallucination in Vision-Language Models via Selective and Contrastive Decoding· slideslive