Dynin-Omni: Omnimodal Unified Large Diffusion Language Model

Jaeik Kim; Woojin Kim; Jihwan Hong; Yejoon Lee; Sieun Hyeon; Mintaek Lim; Yunseok Han; Dogeun Kim; Hoeun Lee; Hyunggeun Kim; Jaeyoung Do

arXiv:2604.00007·cs.CL·April 2, 2026

Dynin-Omni: Omnimodal Unified Large Diffusion Language Model

Jaeik Kim, Woojin Kim, Jihwan Hong, Yejoon Lee, Sieun Hyeon, Mintaek Lim, Yunseok Han, Dogeun Kim, Hoeun Lee, Hyunggeun Kim, Jaeyoung Do

PDF

1 Repo 1 Models

TL;DR

Dynin-Omni is a pioneering masked-diffusion-based model that unifies text, image, speech, and video understanding and generation within a single architecture, enabling iterative refinement across modalities.

Contribution

It introduces the first omnimodal foundation model using masked diffusion over a shared token space, surpassing previous models in diverse multimodal benchmarks.

Findings

01

Achieves state-of-the-art results on 19 multimodal benchmarks.

02

Outperforms existing open-source unified models across multiple tasks.

03

Demonstrates the effectiveness of masked diffusion as a universal modeling paradigm.

Abstract

We present Dynin-Omni, the first masked-diffusion-based omnimodal foundation model that unifies text, image, and speech understanding and generation, together with video understanding, within a single architecture. Unlike autoregressive unified models that serialize heterogeneous modalities, or compositional unified models that require orchestration with external modality-specific decoders, Dynin-Omni natively formulates omnimodal modeling as masked diffusion over a shared discrete token space, enabling iterative refinement under bidirectional context. Dynin-Omni adopts a multi-stage training strategy with model-merging-based modality expansion and omnimodal alignment. We evaluate Dynin-Omni across 19 multimodal benchmarks spanning language reasoning, image generation and editing, video understanding, and speech recognition and synthesis. Dynin-Omni achieves 87.6 on GSM8K, 1733.6 on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aidaslab/Dynin-Omni
github

Models

🤗
snu-aidas/Dynin-Omni
model· 10k dl· ♡ 19
10k dl♡ 19

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.