Seeing is Believing? Mitigating OCR Hallucinations in Multimodal Large Language Models
Zhentao He, Can Zhang, Ziheng Wu, Zhenghao Chen, Yufei Zhan, Yifan Li, Zhao Zhang, Xian Wang, Minghui Qiu

TL;DR
This paper introduces a new benchmark and a framework to evaluate and reduce OCR hallucinations in multimodal large language models, especially under degraded visual conditions, improving document understanding accuracy.
Contribution
The paper presents KIE-HVQA, the first benchmark for OCR hallucination evaluation, and a GRPO-based framework with a reward mechanism to mitigate hallucinations in degraded document understanding.
Findings
22% improvement in hallucination-free accuracy on KIE-HVQA
No significant performance drop on standard tasks
Effective mitigation of hallucinations in ambiguous regions
Abstract
Recent advancements in multimodal large language models have enhanced document understanding by integrating textual and visual information. However, existing models exhibit incompleteness within their paradigm in real-world scenarios, particularly under visual degradation. In such conditions, the current response paradigm often fails to adequately perceive visual degradation and ambiguity, leading to overreliance on linguistic priors or misaligned visual-textual reasoning. This difficulty in recognizing uncertainty frequently results in the generation of hallucinatory content, especially when a precise answer is not feasible. To better demonstrate and analyze this phenomenon and problem, we propose KIE-HVQA, the first benchmark dedicated to evaluating OCR hallucination in degraded document understanding. This dataset includes test samples spanning identity cards and invoices, with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPsychosomatic Disorders and Their Treatments · Clinical Reasoning and Diagnostic Skills
