CoMoE: Contrastive Representation for Mixture-of-Experts in Parameter-Efficient Fine-tuning

Jinyuan Feng; Chaopeng Wei; Tenghai Qiu; Tianyi Hu; Zhiqiang Pu

arXiv:2505.17553·cs.LG·August 29, 2025

CoMoE: Contrastive Representation for Mixture-of-Experts in Parameter-Efficient Fine-tuning

Jinyuan Feng, Chaopeng Wei, Tenghai Qiu, Tianyi Hu, Zhiqiang Pu

PDF

Open Access 1 Video

TL;DR

This paper introduces CoMoE, a contrastive learning approach for mixture-of-experts models that improves expert specialization and utilization, leading to better performance on benchmarks.

Contribution

It proposes a novel contrastive training method for MoE that enhances expert modularization and capacity utilization, addressing limitations of prior MoE variants.

Findings

01

CoMoE improves model capacity and expert specialization.

02

Enhanced performance on multiple benchmarks.

03

Promotes modularization among experts.

Abstract

In parameter-efficient fine-tuning, mixture-of-experts (MoE), which involves specializing functionalities into different experts and sparsely activating them appropriately, has been widely adopted as a promising approach to trade-off between model capacity and computation overhead. However, current MoE variants fall short on heterogeneous datasets, ignoring the fact that experts may learn similar knowledge, resulting in the underutilization of MoE's capacity. In this paper, we propose Contrastive Representation for MoE (CoMoE), a novel method to promote modularization and specialization in MoE, where the experts are trained along with a contrastive objective by sampling from activated and inactivated experts in top-k routing. We demonstrate that such a contrastive objective recovers the mutual-information gap between inputs and the two types of experts. Experiments on several benchmarks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

CoMoE: Contrastive Representation for Mixture-of-Experts in Parameter-Efficient Fine-tuning· underline

Taxonomy

TopicsSpeech and Audio Processing · Target Tracking and Data Fusion in Sensor Networks · Distributed Sensor Networks and Detection Algorithms