CoMoE: Contrastive Representation for Mixture-of-Experts in Parameter-Efficient Fine-tuning
Jinyuan Feng, Chaopeng Wei, Tenghai Qiu, Tianyi Hu, Zhiqiang Pu

TL;DR
This paper introduces CoMoE, a contrastive learning approach for mixture-of-experts models that improves expert specialization and utilization, leading to better performance on benchmarks.
Contribution
It proposes a novel contrastive training method for MoE that enhances expert modularization and capacity utilization, addressing limitations of prior MoE variants.
Findings
CoMoE improves model capacity and expert specialization.
Enhanced performance on multiple benchmarks.
Promotes modularization among experts.
Abstract
In parameter-efficient fine-tuning, mixture-of-experts (MoE), which involves specializing functionalities into different experts and sparsely activating them appropriately, has been widely adopted as a promising approach to trade-off between model capacity and computation overhead. However, current MoE variants fall short on heterogeneous datasets, ignoring the fact that experts may learn similar knowledge, resulting in the underutilization of MoE's capacity. In this paper, we propose Contrastive Representation for MoE (CoMoE), a novel method to promote modularization and specialization in MoE, where the experts are trained along with a contrastive objective by sampling from activated and inactivated experts in top-k routing. We demonstrate that such a contrastive objective recovers the mutual-information gap between inputs and the two types of experts. Experiments on several benchmarks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSpeech and Audio Processing · Target Tracking and Data Fusion in Sensor Networks · Distributed Sensor Networks and Detection Algorithms
