LatentUMM: Dual Latent Alignment for Unified Multimodal Models

Yinyi Luo; Wenwen Wang; Hayes Bai; Marios Savvides; Jindong Wang

arXiv:2605.17766·cs.CV·May 19, 2026

LatentUMM: Dual Latent Alignment for Unified Multimodal Models

Yinyi Luo, Wenwen Wang, Hayes Bai, Marios Savvides, Jindong Wang

PDF

1 Repo

TL;DR

LatentUMM introduces a dual latent alignment framework to explicitly align transformations in shared latent spaces, enhancing cross-modal consistency in unified multimodal models.

Contribution

It proposes a novel dual latent alignment approach with latent dynamics stabilization to improve semantic consistency in multimodal models.

Findings

01

Improves cross-modal semantic consistency across architectures

02

Enhances robustness via stochastic latent rollouts

03

Achieves better alignment between generation and re-encoding processes

Abstract

Unified multimodal models (UMMs) achieve strong performance in both understanding and generation by learning a shared latent space, yet they often exhibit functional inconsistency between these two capabilities. We observe that this issue does not stem from a lack of shared representations, but from the absence of explicit alignment between the transformations that map into and out of the latent space. As a result, generation and re-encoding can follow inconsistent trajectories, leading to semantic drift under modality transitions. In this work, we propose LatentUMM, a framework that constructs an enhanced shared latent space to explicitly align these transformations and improve cross-modal consistency. LatentUMM consists of two stages. First, dual latent alignment enforces consistency at both the modality and capacity levels: cross-modal alignment uses a stronger embedding model to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AIFrontierLab/TorchUMM/tree/main/src/umm/post_training/LatentUMM
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.