HDEE: Heterogeneous Domain Expert Ensemble
O\u{g}uzhan Ersoy, Jari Kolehmainen, Gabriel Passamani Andrade

TL;DR
This paper investigates how introducing heterogeneity in ensemble models of domain experts—varying model sizes and training steps—improves performance across diverse data domains compared to homogeneous ensembles.
Contribution
It demonstrates that heterogeneous ensembles outperform homogeneous ones in most domains by adapting model size and training effort to domain complexity.
Findings
Heterogeneous ensembles achieve lower perplexity in 20 out of 21 domains.
Varying model size and training steps improves domain-specific performance.
Heterogeneity benefits are consistent across included and excluded domains.
Abstract
Training dense LLMs requires enormous amounts of data and centralized compute, which introduces fundamental bottlenecks and ever-growing costs for large models. Several studies aim to reduce this dependency on centralization by reducing the communication overhead of training dense models. Taking this idea of reducing communication overhead to a natural extreme, by training embarrassingly parallelizable ensembles of small independent experts, has been shown to outperform large dense models trained in traditional centralized settings. However, existing studies do not take into account underlying differences amongst data domains and treat them as monolithic, regardless of their underlying complexity, size, or distribution. In this paper, we explore the effects of introducing heterogeneity to these ensembles of domain expert models. Specifically, by allowing models within the ensemble to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling · Advanced Graph Neural Networks
