XPERT: Expert Knowledge Transfer for Effective Training of Language Models
Chang Liu, Boyu Shi, Xu Yang, Xin Geng

TL;DR
XPERT introduces a framework to extract and reuse expert knowledge from MoE language models, enhancing training efficiency and performance across various NLP tasks.
Contribution
The paper presents a novel method for identifying, refining, and reusing expert knowledge from MoE LLMs to improve training effectiveness.
Findings
Models with reused expert knowledge outperform baselines in language understanding and dialogue tasks.
Reusing expert knowledge leads to faster convergence during training.
Cross-domain experts encode generalizable knowledge beneficial for multiple tasks.
Abstract
Mixture-of-Experts (MoE) language models organize knowledge into explicitly routed expert modules, making expert-level representations traceable and analyzable. By analyzing expert activation patterns in MoE large language models (LLMs), we find that a subset of experts is consistently activated across diverse knowledge domains. These common experts encode cross-domain, generalizable knowledge that is closely related to model generalization, naturally raising the question of how such identifiable expert knowledge can be practically reused. Motivated by this observation, we propose XPERT, a framework that extracts, consolidates, and reuses expert knowledge from pre-trained MoE LLMs to support more effective training of language models across different model scales. XPERT identifies cross-domain experts via inference-only analysis, refines their representations through tensor…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
