Token Preference Optimization with Self-Calibrated Visual-Anchored Rewards for Hallucination Mitigation
Jihao Gu, Yingyao Wang, Meng Cao, Pi Bu, Jun Song, Yancheng He, Shilong Li, Bo Zheng

TL;DR
This paper introduces TPO, a novel token preference optimization method with self-calibrated visual-anchored rewards, significantly reducing hallucinations in large vision-language models by focusing on visual-correlated tokens without detailed annotations.
Contribution
The paper presents a new TPO model that adaptively emphasizes visual-anchored tokens using self-calibrated rewards, improving hallucination mitigation in LVLMs.
Findings
Achieves state-of-the-art hallucination mitigation performance.
Boosts performance on hallucination benchmarks when built on LLAVA-1.5-7B.
Effectively attends to visual-correlated tokens without fine-grained annotations.
Abstract
Direct Preference Optimization (DPO) has been demonstrated to be highly effective in mitigating hallucinations in Large Vision Language Models (LVLMs) by aligning their outputs more closely with human preferences. Despite the recent progress, existing methods suffer from two drawbacks: 1) Lack of scalable token-level rewards; and 2) Neglect of visual-anchored tokens. To this end, we propose a novel Token Preference Optimization model with self-calibrated rewards (dubbed as TPO), which adaptively attends to visual-correlated tokens without fine-grained annotations. Specifically, we introduce a token-level \emph{visual-anchored} \emph{reward} as the difference of the logistic distributions of generated tokens conditioned on the raw image and the corrupted one. In addition, to highlight the informative visual-anchored tokens, a visual-aware training objective is proposed to enhance more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsPsychedelics and Drug Studies · Functional Brain Connectivity Studies · Hallucinations in medical conditions
