Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model
Longrong Yang, Dong Shen, Chaoxiang Cai, Fan Yang, Tingting Gao, Di, Zhang, Xi Li

TL;DR
This paper introduces a token-level gradient analysis method to reduce interference among tokens within experts in Mixture-of-Experts models for large vision-language models, improving performance and efficiency.
Contribution
It proposes a novel regularization technique based on token gradient conflicts to enhance MoE training in LVLMs, addressing intra-expert interference issues.
Findings
Effective reduction of token interference within experts.
Improved model performance demonstrated through experiments.
Method serves as a versatile plug-in for various LVLM approaches.
Abstract
The Mixture-of-Experts (MoE) has gained increasing attention in studying Large Vision-Language Models (LVLMs). It uses a sparse model to replace the dense model, achieving comparable performance while activating fewer parameters during inference, thus significantly reducing the inference cost. Existing MoE methods in LVLM encourage different experts to specialize in different tokens, and they usually employ a router to predict the routing of each token. However, the router is not optimized concerning distinct parameter optimization directions generated from tokens within an expert. This may lead to severe interference between tokens within an expert. To address this problem, we propose to use the token-level gradient analysis to Solving Token Gradient Conflict (STGC) in this paper. Specifically, we first use token-level gradients to identify conflicting tokens in experts. After that, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
MethodsSoftmax · Attention Is All You Need · Mixture of Experts
