Pseudo-Unification: Entropy Probing Reveals Divergent Information Patterns in Unified Multimodal Models

Songlin Yang; Xianghao Kong; Anyi Rao

arXiv:2604.10949·cs.CV·April 14, 2026

Pseudo-Unification: Entropy Probing Reveals Divergent Information Patterns in Unified Multimodal Models

Songlin Yang, Xianghao Kong, Anyi Rao

PDF

TL;DR

This paper introduces an information-theoretic probing framework to analyze why unified multimodal models often fail to achieve true synergy, revealing divergence in encoding and response patterns.

Contribution

It presents a novel probing method that uncovers internal causes of pseudo-unification, emphasizing the importance of consistent information flow for genuine multimodal integration.

Findings

01

Pseudo-unification results from modality-asymmetric encoding and pattern-split responses.

02

Models with unified encoding and response patterns achieve better reasoning and generation.

03

The framework provides the first internal analysis linking information divergence to multimodal model performance.

Abstract

Unified multimodal models (UMMs) were designed to combine the reasoning ability of large language models (LLMs) with the generation capability of vision models. In practice, however, this synergy remains elusive: UMMs fail to transfer LLM-like reasoning to image synthesis and exhibit divergent response behaviors. We term this phenomenon pseudo-unification. Diagnosing its internal causes is important, but existing probing methods either lack model-internal insight or ignore prompt-response dependencies. To address these limitations, we propose an information-theoretic probing framework that jointly analyzes how UMMs encode inputs and generate outputs. Applied to ten representative UMMs, our framework reveals that pseudo-unification stems from a dual divergence: (i) Modality-Asymmetric Encoding, where vision and language follow different entropy trajectories, and (ii) Pattern-Split…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.