Resolving Task Objective Conflicts in Unified Model via Task-Aware Mixture-of-Experts

Jiaxing Zhang; Hao Tang

arXiv:2506.03591·cs.CV·October 7, 2025

Resolving Task Objective Conflicts in Unified Model via Task-Aware Mixture-of-Experts

Jiaxing Zhang, Hao Tang

PDF

Open Access

TL;DR

This paper introduces UTAMoE, a novel mixture-of-experts framework that decouples internal autoregressive modules in multimodal large language models to resolve task conflicts and improve performance.

Contribution

It proposes a task-aware MoE layer and a two-stage training strategy to effectively mitigate task conflicts in unified multimodal models.

Findings

01

Achieves state-of-the-art results on multimodal benchmarks.

02

Effectively reduces task interference and improves task-specific performance.

03

Validates approach through extensive ablation studies.

Abstract

Unified multimodal large language models (MLLMs) based on end-to-end autoregressive (AR) transformers effectively integrate both understanding and generation tasks within a single framework. However, intrinsic Task Objective Conflicts between high-level semantic abstraction in understanding and fine-grained detail preservation in generation pose significant challenges, often leading to suboptimal trade-offs and task interference. Existing solutions, such as decoupling shared visual encoders, fall short of fundamentally resolving these conflicts due to inherent AR architecture. In this paper, we propose a novel approach that decouples internal components of AR to resolve task objective conflicts. Specifically, we design UTAMoE, a Unified Task-Aware Mixture-of-Experts (MoE) framework that decouples internal AR modules via a Task-Aware MoE Layer to create task-specific optimization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman-Automation Interaction and Safety · Context-Aware Activity Recognition Systems · Multi-Agent Systems and Negotiation