UniGame: Turning a Unified Multimodal Model Into Its Own Adversary

Zhaolong Su; Wang Lu; Hao Chen; Sharon Li; Jindong Wang

arXiv:2511.19413·cs.LG·March 31, 2026

UniGame: Turning a Unified Multimodal Model Into Its Own Adversary

Zhaolong Su, Wang Lu, Hao Chen, Sharon Li, Jindong Wang

PDF

2 Repos

TL;DR

UniGame introduces a self-adversarial post-training method for unified multimodal models, significantly enhancing their understanding, generation, and robustness by actively challenging their own representations.

Contribution

It presents a lightweight, architecture-agnostic framework that improves UMMs through adversarial self-play, with less than 1% additional parameters and compatible with existing methods.

Findings

01

Improves consistency by +4.6% on GenEval

02

Enhances understanding by +3.6% and generation by +0.02

03

Boosts robustness by +4.8% and +6.2% on NaturalBench and AdVQA

Abstract

Unified Multimodal Models (UMMs) have shown impressive performance in both understanding and generation with a single architecture. However, UMMs still exhibit a fundamental inconsistency: understanding favors compact embeddings, whereas generation favors reconstruction-rich representations. This structural trade-off produces misaligned decision boundaries, degraded cross-modal coherence, and heightened vulnerability under distributional and adversarial shifts. In this paper, we present UniGame, a self-adversarial post-training framework that directly targets the inconsistencies. By applying a lightweight perturber at the shared token interface, UniGame enables the generation branch to actively seek and challenge fragile understanding, turning the model itself into its own adversary. Experiments demonstrate that UniGame significantly improves the consistency (+4.6%). Moreover, it also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.