The Wittgensteinian Representation Hypothesis: Is Language the Attractor of Multimodal Convergence?

Zhaoyang Zhang; Run Shao; Dongyue Wu; Jiajie Teng; Chao Tao; Jingdong Chen; Haifeng Li

arXiv:2605.09352·cs.AI·May 12, 2026

The Wittgensteinian Representation Hypothesis: Is Language the Attractor of Multimodal Convergence?

Zhaoyang Zhang, Run Shao, Dongyue Wu, Jiajie Teng, Chao Tao, Jingdong Chen, Haifeng Li

PDF

TL;DR

This paper introduces a novel asymmetric analysis method revealing that non-language modalities tend to converge towards language-like representations, suggesting language acts as an attractor in multimodal learning.

Contribution

It presents directional convergence analysis using cycle-kNN, uncovering asymmetric patterns in multimodal models and proposing the Wittgensteinian Representation Hypothesis.

Findings

01

Non-language modalities move toward language representations more than vice versa.

02

Symmetric similarity measures fail to detect this directional asymmetry.

03

Language representations occupy the most compact regions of the representational space.

Abstract

Understanding why independently trained neural networks from different modalities converge toward shared representations, and where this convergence leads, remains an open question in representation learning. All existing evidence relies on symmetric similarity measures, which can detect convergence but are structurally blind to its direction. We introduce directional convergence analysis using cycle-kNN, an asymmetric alignment measure, applied across dozens of independently trained unimodal models spanning point clouds, vision, and language. We uncover a consistent directional asymmetry: non-language modalities move toward the neighborhood structure of language significantly more than the reverse, and this pattern holds across all model families and scales--yet is entirely invisible to symmetric measures. Mechanistic analysis traces the directionality to feature density asymmetry,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.