Anisotropic Modality Align

Xiaomin Yu; Yijiang Li; Yuhui Zhang; Hanzhen Zhao; Yue Yang; Hao Tang; Yue Song; Xiaobin Hu; Chengwei Qin; Shuicheng Yan; Hui Xiong

arXiv:2605.07825·cs.MM·May 11, 2026

Anisotropic Modality Align

Xiaomin Yu, Yijiang Li, Yuhui Zhang, Hanzhen Zhao, Yue Yang, Hao Tang, Yue Song, Xiaobin Hu, Chengwei Qin, Shuicheng Yan, Hui Xiong

PDF

1 Repo

TL;DR

This paper investigates the geometric structure of the modality gap in multimodal models, revealing anisotropic residuals as key obstacles, and proposes a correction framework, AnisoAlign, for better modality alignment using geometric priors.

Contribution

It uncovers the anisotropic nature of the modality gap and introduces a geometric correction method, AnisoAlign, for improved unimodal training of multimodal models.

Findings

01

Modality representations share compatible semantic geometry.

02

The modality gap is due to anisotropic residuals along dominant directions.

03

AnisoAlign improves geometric alignment and text-only multimodal training.

Abstract

Training multimodal large language models has long been limited by the scarcity of high-quality paired multimodal data. Recent studies show that the shared representation space of pretrained multimodal contrastive models can serve as a bridge, enabling models to perform multimodal training with unimodal data. However, the key premise of this paradigm remains insufficiently understood: can representations from different modalities be reliably interchanged? The core obstacle lies in the persistent Modality Gap in the shared space. In this work, we revisit the geometric nature of the modality gap. We find that modality representations already share compatible dominant semantic geometry. What truly hinders modality interchangeability is not a simple global shift, but an anisotropic residual structure concentrated along a small number of dominant directions. Based on this finding, we further…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yu-xm/Modality_Gap_Theory
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.