The Vision Wormhole: Latent-Space Communication in Heterogeneous Multi-Agent Systems

Xiaoze Liu; Ruowang Zhang; Weichen Yu; Siheng Xiong; Liu He; Feijie Wu; Hoin Jung; Matt Fredrikson; Xiaoqian Wang; Jing Gao

arXiv:2602.15382·cs.CL·February 18, 2026

The Vision Wormhole: Latent-Space Communication in Heterogeneous Multi-Agent Systems

Xiaoze Liu, Ruowang Zhang, Weichen Yu, Siheng Xiong, Liu He, Feijie Wu, Hoin Jung, Matt Fredrikson, Xiaoqian Wang, Jing Gao

PDF

Open Access

TL;DR

The paper introduces the Vision Wormhole, a novel framework enabling efficient, model-agnostic, text-free communication in multi-agent systems by leveraging visual encodings to transfer reasoning traces across heterogeneous models.

Contribution

It proposes a universal visual codec and a hub-and-spoke topology to facilitate scalable, high-bandwidth inter-agent communication without relying on pair-specific translators.

Findings

01

Reduces communication overhead and runtime in multi-agent reasoning.

02

Maintains reasoning accuracy comparable to text-based communication.

03

Demonstrates effectiveness across diverse model architectures.

Abstract

Multi-Agent Systems (MAS) powered by Large Language Models have unlocked advanced collaborative reasoning, yet they remain shackled by the inefficiency of discrete text communication, which imposes significant runtime overhead and information quantization loss. While latent state transfer offers a high-bandwidth alternative, existing approaches either assume homogeneous sender-receiver architectures or rely on pair-specific learned translators, limiting scalability and modularity across diverse model families with disjoint manifolds. In this work, we propose the Vision Wormhole, a novel framework that repurposes the visual interface of Vision-Language Models (VLMs) to enable model-agnostic, text-free communication. By introducing a Universal Visual Codec, we map heterogeneous reasoning traces into a shared continuous latent space and inject them directly into the receiver's visual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Topic Modeling