Schr\"oMind: Mitigating Hallucinations in Multimodal Large Language Models via Solving the Schr\"odinger Bridge Problem
Ziqiang Shi, Rujie Liu, Shanshan Yu, Satoshi Munakata, Koichi Shirahata

TL;DR
Schr"oMind introduces a novel framework that reduces hallucinations in multimodal large language models by solving the Schr"odinger bridge problem, aligning hallucinatory and truthful activations with minimal computational cost.
Contribution
It presents a new method to mitigate hallucinations in MLLMs by establishing token-level mappings through Schr"odinger bridge solutions, preserving model capabilities.
Findings
Achieves state-of-the-art results on POPE and MME benchmarks.
Demonstrates minimal computational overhead.
Effectively reduces hallucinations in high-stakes applications.
Abstract
Recent advancements in Multimodal Large Language Models (MLLMs) have achieved significant success across various domains. However, their use in high-stakes fields like healthcare remains limited due to persistent hallucinations, where generated text contradicts or ignores visual input. We contend that MLLMs can comprehend images but struggle to produce accurate token sequences. Minor perturbations can shift attention from truthful to untruthful states, and the autoregressive nature of text generation often prevents error correction. To address this, we propose Schr\"oMind-a novel framework reducing hallucinations via solving the Schr\"odinger bridge problem. It establishes a token-level mapping between hallucinatory and truthful activations with minimal transport cost through lightweight training, while preserving the model's original capabilities. Extensive experiments on the POPE and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health via Writing · Misinformation and Its Impacts · Machine Learning in Healthcare
