Flash-Unified: A Training-Free and Task-Aware Acceleration Framework for Native Unified Models

Junlong Ke; Zichen Wen; Boxue Yang; Yantai Yang; Xuyang Liu; Chenfei Liao; Zhaorun Chen; Shaobo Wang; Linfeng Zhang

arXiv:2603.15271·cs.CV·March 17, 2026

Flash-Unified: A Training-Free and Task-Aware Acceleration Framework for Native Unified Models

Junlong Ke, Zichen Wen, Boxue Yang, Yantai Yang, Xuyang Liu, Chenfei Liao, Zhaorun Chen, Shaobo Wang, Linfeng Zhang

PDF

Open Access

TL;DR

This paper introduces FlashU, a training-free, task-aware acceleration framework for unified multimodal models that significantly improves inference speed by tailoring optimization to specific tasks without sacrificing performance.

Contribution

The work reveals task-specific parameter specialization in unified models and proposes a novel, training-free acceleration method combining pruning, dynamic skipping, and task-specific optimizations.

Findings

01

Achieves 1.78× to 2.01× inference acceleration

02

Maintains state-of-the-art performance on Show-o2

03

Outperforms existing unified models

Abstract

Native unified multimodal models, which integrate both generative and understanding capabilities, face substantial computational overhead that hinders their real-world deployment. Existing acceleration techniques typically employ a static, monolithic strategy, ignoring the fundamental divergence in computational profiles between iterative generation tasks (e.g., image generation) and single-pass understanding tasks (e.g., VQA). In this work, we present the first systematic analysis of unified models, revealing pronounced parameter specialization, where distinct neuron sets are critical for each task. This implies that, at the parameter level, unified models have implicitly internalized separate inference pathways for generation and understanding within a single architecture. Based on these insights, we introduce a training-free and task-aware acceleration framework, FlashU, that tailors…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning