ParaUni: Enhance Generation in Unified Multimodal Model with Reinforcement-driven Hierarchical Parallel Information Interaction

Jiangtong Tan; Lin Liu; Jie Huanng; Xiaopeng Zhang; Qi Tian; Feng Zhao

arXiv:2512.05422·cs.CV·March 31, 2026

ParaUni: Enhance Generation in Unified Multimodal Model with Reinforcement-driven Hierarchical Parallel Information Interaction

Jiangtong Tan, Lin Liu, Jie Huanng, Xiaopeng Zhang, Qi Tian, Feng Zhao

PDF

1 Repo

TL;DR

ParaUni introduces a hierarchical, parallel feature extraction method for unified multimodal models, leveraging reinforcement learning to improve visual generation quality by integrating multi-layer visual features effectively.

Contribution

It proposes ParaUni, a novel parallel feature extraction framework with a dynamic reward adjustment mechanism for enhanced multimodal generation.

Findings

01

Significantly improves visual generation quality.

02

Effectively integrates multi-layer features for better performance.

03

Demonstrates strong potential for reward-based training improvements.

Abstract

Unified multimodal models significantly improve visual generation by combining vision-language models (VLMs) with diffusion models. However, existing methods struggle to fully balance sufficient interaction and flexible implementation due to vast representation difference. Considering abundant and hierarchical information in VLM's layers from low-level details to high-level semantics, we propose \textbf{ParaUni}. It extracts features from variants VLM's layers in a \textbf{Para}llel way for comprehensive information interaction and retains a flexible separation architecture to enhance generation in \textbf{Uni}fied multimodal model. Concretely, visual features from all VLM's layers are fed in parallel into a Layer Integration Module (LIM), which efficiently integrates fine-grained details and semantic abstractions and provides the fused representation as a condition to the diffusion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

JosephTiTan/ParaUni
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.