Adaptive Residual-Update Steering for Low-Overhead Hallucination Mitigation in Large Vision Language Models
Zhengtao Zou, Ya Gao, Jiarui Guan, Bin Li, Pekka Marttinen

TL;DR
The paper introduces RUDDER, a low-overhead decoding regulation framework that mitigates hallucinations in large vision-language models by creating a persistent visual anchor and adaptively injecting evidence during decoding.
Contribution
RUDDER is a novel method that counters visual information dilution in LVLMs through residual-based evidence injection with adaptive gating, improving hallucination mitigation with minimal latency.
Findings
RUDDER reduces hallucination metrics by approximately 24% on average.
It maintains over 96% throughput across various architectures.
Effective across multiple large vision-language models.
Abstract
Large Vision-Language Models (LVLMs) typically process visual inputs as a prefix to the language decoder. As the model autoregressively generates text, this initial visual information inevitably undergoes "dilution" leading the model to over-rely on language priors and hallucinate objects. Existing interventions attempt to correct this by contrasting logits or iteratively refining outputs, but they incur prohibitive latency costs. We propose Residual-Update Directed DEcoding Regulation (RUDDER), a framework that counters visual dilution by creating a persistent visual anchor. We extract a robust evidence direction (CARD) directly from the model's prefill residual updates, and inject it into the decoding process. This injection is modulated by an adaptive gate, the Beta Gate, which acts as a trust mechanism and ensures the visual reminder is applied only when necessary. Experiments on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
