AIA: Rethinking Architecture Decoupling Strategy In Unified Multimodal Model

Dian Zheng; Manyuan Zhang; Hongyu Li; Kai Zou; Hongbo Liu; Ziyu Guo; Kaituo Feng; Yexin Liu; Ying Luo; Hongsheng Li

arXiv:2511.22663·cs.CV·May 13, 2026

AIA: Rethinking Architecture Decoupling Strategy In Unified Multimodal Model

Dian Zheng, Manyuan Zhang, Hongyu Li, Kai Zou, Hongbo Liu, Ziyu Guo, Kaituo Feng, Yexin Liu, Ying Luo, Hongsheng Li

PDF

1 Repo 1 Models

TL;DR

This paper introduces Attention Interaction Alignment (AIA) loss to improve unified multimodal models by aligning task-specific cross-modal interactions, avoiding excessive architecture decoupling.

Contribution

The paper proposes a novel AIA loss that mitigates task conflicts in multimodal models without decoupling, enhancing both understanding and generation capabilities.

Findings

01

AIA refines cross-modal attention patterns effectively.

02

Applying AIA boosts performance in both generation and understanding tasks.

03

Decoupling drives models toward task-specific interaction patterns, which AIA aims to align.

Abstract

Unified multimodal models for image generation and understanding represent a significant step toward AGI and have attracted widespread attention from researchers. The main challenge of this task lies in the difficulty in establishing an optimal training paradigm due to inherent conflicting targets in understanding and generation tasks. To alleviate these conflicts and pursue higher performance, many researchers adopt varying degrees of architecture decoupling (e.g., Double image encoders, MOE/MOT architecture, or frozen MLLM). However, excessive model decoupling can lead to the loss of interleave generation ability, undermining the original intent of unified models. In this work, we aim to explore how to mitigate task conflicts without resorting to model decoupling. Firstly, we analyze why decoupling boosts performance by studying the cross-modal attention behavior of models. We observe…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhengdian1/AIA
github

Models

🤗
zhengli1013/AIA
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.