ReLoop: "Seeing Twice and Thinking Backwards" via Closed-loop Training to Mitigate Hallucinations in Multimodal understanding

Jianjiang Yang; Yanshu li; Ziyan Huang

arXiv:2507.04943·cs.CV·October 1, 2025

ReLoop: "Seeing Twice and Thinking Backwards" via Closed-loop Training to Mitigate Hallucinations in Multimodal understanding

Jianjiang Yang, Yanshu li, Ziyan Huang

PDF

Open Access 1 Video

TL;DR

ReLoop introduces a closed-loop training framework for multimodal models that enhances their internal consistency and significantly reduces hallucinations by integrating multiple feedback mechanisms during training.

Contribution

The paper presents ReLoop, a novel training method that enforces multimodal consistency through a ring-shaped structure with integrated feedback modules, addressing hallucinations internally.

Findings

01

ReLoop reduces hallucination rates across multiple benchmarks.

02

The framework improves semantic and visual consistency in MLLMs.

03

ReLoop enhances interpretability through attention supervision.

Abstract

While Multimodal Large Language Models (MLLMs) have achieved remarkable progress in open-ended visual question answering, they remain vulnerable to hallucinations. These are outputs that contradict or misrepresent input semantics, posing a critical challenge to the reliability and factual consistency. Existing methods often rely on external verification or post-hoc correction, lacking an internal mechanism to validate outputs directly during training. To bridge this gap, we propose ReLoop, a unified closed-loop training framework that encourages multimodal consistency for cross-modal understanding in MLLMs. ReLoop adopts a ring-shaped structure that integrates three complementary consistency feedback mechanisms, obliging MLLMs to "seeing twice and thinking backwards". Specifically, ReLoop employs the frozen Consistency Feedback Plugin (CFP), comprising semantic reconstruction, visual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

ReLoop: "Seeing Twice and Thinking Backwards" via Closed-loop Training to Mitigate Hallucinations in Multimodal understanding· underline

Taxonomy

TopicsCognitive Science and Education Research · Mental Health Research Topics