ReLoop: "Seeing Twice and Thinking Backwards" via Closed-loop Training to Mitigate Hallucinations in Multimodal understanding
Jianjiang Yang, Yanshu li, Ziyan Huang

TL;DR
ReLoop introduces a closed-loop training framework for multimodal models that enhances their internal consistency and significantly reduces hallucinations by integrating multiple feedback mechanisms during training.
Contribution
The paper presents ReLoop, a novel training method that enforces multimodal consistency through a ring-shaped structure with integrated feedback modules, addressing hallucinations internally.
Findings
ReLoop reduces hallucination rates across multiple benchmarks.
The framework improves semantic and visual consistency in MLLMs.
ReLoop enhances interpretability through attention supervision.
Abstract
While Multimodal Large Language Models (MLLMs) have achieved remarkable progress in open-ended visual question answering, they remain vulnerable to hallucinations. These are outputs that contradict or misrepresent input semantics, posing a critical challenge to the reliability and factual consistency. Existing methods often rely on external verification or post-hoc correction, lacking an internal mechanism to validate outputs directly during training. To bridge this gap, we propose ReLoop, a unified closed-loop training framework that encourages multimodal consistency for cross-modal understanding in MLLMs. ReLoop adopts a ring-shaped structure that integrates three complementary consistency feedback mechanisms, obliging MLLMs to "seeing twice and thinking backwards". Specifically, ReLoop employs the frozen Consistency Feedback Plugin (CFP), comprising semantic reconstruction, visual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsCognitive Science and Education Research · Mental Health Research Topics
