Solving Token Gradient Conflict in Mixture-of-Experts for Large   Vision-Language Model

Longrong Yang; Dong Shen; Chaoxiang Cai; Fan Yang; Tingting Gao; Di; Zhang; Xi Li

arXiv:2406.19905·cs.CV·March 18, 2025

Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model

Longrong Yang, Dong Shen, Chaoxiang Cai, Fan Yang, Tingting Gao, Di, Zhang, Xi Li

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper introduces a token-level gradient analysis method to reduce interference among tokens within experts in Mixture-of-Experts models for large vision-language models, improving performance and efficiency.

Contribution

It proposes a novel regularization technique based on token gradient conflicts to enhance MoE training in LVLMs, addressing intra-expert interference issues.

Findings

01

Effective reduction of token interference within experts.

02

Improved model performance demonstrated through experiments.

03

Method serves as a versatile plug-in for various LVLM approaches.

Abstract

The Mixture-of-Experts (MoE) has gained increasing attention in studying Large Vision-Language Models (LVLMs). It uses a sparse model to replace the dense model, achieving comparable performance while activating fewer parameters during inference, thus significantly reducing the inference cost. Existing MoE methods in LVLM encourage different experts to specialize in different tokens, and they usually employ a router to predict the routing of each token. However, the router is not optimized concerning distinct parameter optimization directions generated from tokens within an expert. This may lead to severe interference between tokens within an expert. To address this problem, we propose to use the token-level gradient analysis to Solving Token Gradient Conflict (STGC) in this paper. Specifically, we first use token-level gradients to identify conflicting tokens in experts. After that, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

longrongyang/stgc
noneOfficial

Models

🤗
gustavlangstroem/Microexpert_NG
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques

MethodsSoftmax · Attention Is All You Need · Mixture of Experts