Flash-Unified: A Training-Free and Task-Aware Acceleration Framework for Native Unified Models
Junlong Ke, Zichen Wen, Boxue Yang, Yantai Yang, Xuyang Liu, Chenfei Liao, Zhaorun Chen, Shaobo Wang, Linfeng Zhang

TL;DR
This paper introduces FlashU, a training-free, task-aware acceleration framework for unified multimodal models that significantly improves inference speed by tailoring optimization to specific tasks without sacrificing performance.
Contribution
The work reveals task-specific parameter specialization in unified models and proposes a novel, training-free acceleration method combining pruning, dynamic skipping, and task-specific optimizations.
Findings
Achieves 1.78× to 2.01× inference acceleration
Maintains state-of-the-art performance on Show-o2
Outperforms existing unified models
Abstract
Native unified multimodal models, which integrate both generative and understanding capabilities, face substantial computational overhead that hinders their real-world deployment. Existing acceleration techniques typically employ a static, monolithic strategy, ignoring the fundamental divergence in computational profiles between iterative generation tasks (e.g., image generation) and single-pass understanding tasks (e.g., VQA). In this work, we present the first systematic analysis of unified models, revealing pronounced parameter specialization, where distinct neuron sets are critical for each task. This implies that, at the parameter level, unified models have implicitly internalized separate inference pathways for generation and understanding within a single architecture. Based on these insights, we introduce a training-free and task-aware acceleration framework, FlashU, that tailors…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning
