Assembly of Experts: Linear-time construction of the Chimera LLM variants with emergent and adaptable behaviors
Henrik Klagges, Robert Dahlke, Fabian Klemm, Benjamin Merkel, Daniel Klingmann, David A. Reiss, Dan Zecha

TL;DR
This paper introduces a linear-time method to create hybrid language models by interpolating weights from parent models, resulting in functional, adaptable, and efficient Chimera variants with emergent behaviors.
Contribution
The paper presents the Assembly-of-Experts method for rapid construction of hybrid LLMs, enabling new models with emergent traits without fine-tuning or distillation.
Findings
Nearly all generated models are functional and capable.
The Chimera model achieves R1-level intelligence with 40% fewer tokens.
Behavioral traits change gradually or abruptly depending on weight interpolation.
Abstract
Requiring - FLOPs to calculate one 8 bit weight in an LLM during pretraining is extremely expensive and seems inefficient. To better leverage the huge investments made into pretrained models, we develop the new "Assembly-of-Experts" (AoE) construction method to create capable child variants of existing Mixture-of-Experts parent models in linear time. Model weight tensors get interpolated individually, allowing to enhance or suppress semantic features of the parents. Varying the proportion of weights taken from the parent models, we observe some properties of the AoE child model changing gradually, while other behavioral traits emerge with a sharp transition. Surprisingly, nearly every generated model is functional and capable, which makes searching the model space straightforward. We construct the DeepSeek R1T "Chimera", a 671B open-weights hybrid model combining…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗tngtech/DeepSeek-R1T-Chimeramodel· 73 dl· ♡ 26773 dl♡ 267
- 🤗tngtech/DeepSeek-TNG-R1T2-Chimeramodel· 1.8k dl· ♡ 2701.8k dl♡ 270
- 🤗unsloth/DeepSeek-TNG-R1T2-Chimeramodel· 17 dl· ♡ 617 dl♡ 6
- 🤗unsloth/DeepSeek-TNG-R1T2-Chimera-BF16model· 14 dl· ♡ 314 dl♡ 3
- 🤗unsloth/DeepSeek-TNG-R1T2-Chimera-GGUFmodel· 364 dl· ♡ 15364 dl♡ 15
- 🤗bullerwins/DeepSeek-TNG-R1T2-Chimera-BF16model· 6 dl· ♡ 16 dl♡ 1
- 🤗Alphatao/Affine-0000000model· 3 dl3 dl
- 🤗Alphatao/Affine-1201201model· 11 dl11 dl
- 🤗Alphatao/Affine-1234567model
- 🤗Alphatao/Affine-7654321model· 2 dl2 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModular Robots and Swarm Intelligence · Multi-Agent Systems and Negotiation · Scheduling and Optimization Algorithms
