The Indra Representation Hypothesis for Multimodal Alignment

Jianglin Lu; Hailing Wang; Kuo Yang; Yitian Zhang; Simon Jenni; and Yun Fu

arXiv:2604.04496·cs.CV·April 7, 2026

The Indra Representation Hypothesis for Multimodal Alignment

Jianglin Lu, Hailing Wang, Kuo Yang, Yitian Zhang, Simon Jenni, and Yun Fu

PDF

1 Repo

TL;DR

This paper introduces the Indra Representation Hypothesis, proposing a relational structure-based representation that improves robustness and alignment across unimodal foundation models in vision, language, and audio.

Contribution

It formalizes the Indra representation using category theory, demonstrating its effectiveness for training-free cross-modal and cross-architecture alignment.

Findings

01

Indra representations enhance robustness across models and modalities.

02

The approach is theoretically grounded and improves alignment without additional training.

03

Experiments show consistent performance gains in diverse scenarios.

Abstract

Recent studies have uncovered an interesting phenomenon: unimodal foundation models tend to learn convergent representations, regardless of differences in architecture, training objectives, or data modalities. However, these representations are essentially internal abstractions of samples that characterize samples independently, leading to limited expressiveness. In this paper, we propose The Indra Representation Hypothesis, inspired by the philosophical metaphor of Indra's Net. We argue that representations from unimodal foundation models are converging to implicitly reflect a shared relational structure underlying reality, akin to the relational ontology of Indra's Net. We formalize this hypothesis using the V-enriched Yoneda embedding from category theory, defining the Indra representation as a relational profile of each sample with respect to others. This formulation is shown to be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Jianglin954/Indra
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.