Mind the Gap No More: Achieving Zero-Gap Multimodal Integration via One Tokenizer

Yanan Li; Christina Yi Jin; Yuan Jin; Manli Luo; Tie Xu; Shuai Jiao; Wei He; Qing Zhang

arXiv:2602.12286·q-bio.GN·May 12, 2026

Mind the Gap No More: Achieving Zero-Gap Multimodal Integration via One Tokenizer

Yanan Li, Christina Yi Jin, Yuan Jin, Manli Luo, Tie Xu, Shuai Jiao, Wei He, Qing Zhang

PDF

TL;DR

This paper introduces One Tokenizer, a unified architecture for multimodal integration in large language models that eliminates the modality gap, leading to improved performance in biological reasoning tasks.

Contribution

The paper provides a theoretical characterization of the modality gap and proposes a native architecture that maps all modalities into a shared token space, achieving zero-gap integration.

Findings

01

One Tokenizer outperforms encoder-based models on DNA-text tasks.

02

Unified token space enables deeper cross-modal reasoning.

03

Theoretical analysis confirms zero-gap state across all layers.

Abstract

A central challenge in developing Multimodal Large Language Models (MLLMs) is effectively integrating heterogeneous inputs into a cohesive reasoning engine. Current paradigms predominantly rely on modular architectures that introduce modality-specific encoders and cross-modal fusion mechanisms. However, these designs are fundamentally bottlenecked by a geometric modality gap, forcing the LLM to expend significant computational capacity on geometric reconciliation rather than deep cross-modal reasoning. In this work, we formally characterize this modality gap and theoretically demonstrate that native architectures, specifically those employing a unified vocabulary, intrinsically maintain a zero-gap state across all hidden layers. Guided by these theoretical findings, we propose \textit{One Tokenizer}, a native architecture that maps all modalities directly into a shared token space. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.