CCoE: A Compact and Efficient LLM Framework with Multi-Expert   Collaboration for Resource-Limited Settings

Shaomang Huang; Jianfeng Pan; Min Peng; Hanzhong Zheng

arXiv:2407.11686·cs.CL·February 18, 2025·1 cites

CCoE: A Compact and Efficient LLM Framework with Multi-Expert Collaboration for Resource-Limited Settings

Shaomang Huang, Jianfeng Pan, Min Peng, Hanzhong Zheng

PDF

Open Access

TL;DR

The paper introduces CCoE, a modular multi-expert LLM framework that enhances resource efficiency and domain adaptability, achieving high performance with reduced memory and inference costs in resource-limited settings.

Contribution

CCoE is a novel modular framework that integrates domain-specific experts into a shared LLM backbone, enabling efficient multi-domain support with flexible expert collaboration.

Findings

01

Achieves state-of-the-art performance across five domains.

02

Reduces memory usage by 61.3% compared to ensemble methods.

03

Improves inference efficiency by 0.76x over existing multi-expert approaches.

Abstract

Large Language Models (LLMs) have achieved exceptional performance across diverse domains through training on massive datasets. However, scaling LLMs to support multiple downstream domain applications remains a significant challenge, especially under resource constraints. Existing approaches often struggle to balance performance across multiple domains with resource efficiency, limiting their broader applicability. To address this, we introduce the CCoE architecture, a modular framework that seamlessly integrates domain-specific experts into a unified LLM. By leveraging independently trained expert subnetworks on a shared backbone partition, CCoE achieves state-of-the-art performance while significantly reducing the resource requirements for multi-expert deployments. Furthermore, rule-based gating and expert planning in CCoE enable flexible task allocation, promoting expert…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies

MethodsBalanced Selection