HDEE: Heterogeneous Domain Expert Ensemble

O\u{g}uzhan Ersoy; Jari Kolehmainen; Gabriel Passamani Andrade

arXiv:2502.19385·cs.LG·February 27, 2025

HDEE: Heterogeneous Domain Expert Ensemble

O\u{g}uzhan Ersoy, Jari Kolehmainen, Gabriel Passamani Andrade

PDF

Open Access 1 Repo

TL;DR

This paper investigates how introducing heterogeneity in ensemble models of domain experts—varying model sizes and training steps—improves performance across diverse data domains compared to homogeneous ensembles.

Contribution

It demonstrates that heterogeneous ensembles outperform homogeneous ones in most domains by adapting model size and training effort to domain complexity.

Findings

01

Heterogeneous ensembles achieve lower perplexity in 20 out of 21 domains.

02

Varying model size and training steps improves domain-specific performance.

03

Heterogeneity benefits are consistent across included and excluded domains.

Abstract

Training dense LLMs requires enormous amounts of data and centralized compute, which introduces fundamental bottlenecks and ever-growing costs for large models. Several studies aim to reduce this dependency on centralization by reducing the communication overhead of training dense models. Taking this idea of reducing communication overhead to a natural extreme, by training embarrassingly parallelizable ensembles of small independent experts, has been shown to outperform large dense models trained in traditional centralized settings. However, existing studies do not take into account underlying differences amongst data domains and treat them as monolithic, regardless of their underlying complexity, size, or distribution. In this paper, we explore the effects of introducing heterogeneity to these ensembles of domain expert models. Specifically, by allowing models within the ensemble to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gensyn-ai/hdee
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling · Advanced Graph Neural Networks