Cost-Penalized Fitness in FMA-Orchestrated Mixture of Experts: Experimental Evidence for Molecular Memory in Domain Adaptation

Martin Jaraiz

arXiv:2604.00812·cs.LG·April 2, 2026

Cost-Penalized Fitness in FMA-Orchestrated Mixture of Experts: Experimental Evidence for Molecular Memory in Domain Adaptation

Martin Jaraiz

PDF

TL;DR

This paper demonstrates that cost-penalized fitness metrics in a transformer with dynamic Mixture-of-Experts enable domain expertise accumulation and reactivation, significantly improving recovery speed and reducing costs.

Contribution

It introduces a novel MoE management approach using cost-penalized fitness and a grace period, leading to molecular memory effects in domain adaptation.

Findings

01

9-11x faster recovery to previous domains with no expert replacement

02

Experts survive dormant and reactivate, enabling domain memory

03

Estimated annual savings of $39.1M and 27.1 GWh energy reduction

Abstract

We present experimental results from seven controlled runs of nanoFMT, a Free-Market Algorithm (FMA) orchestrated transformer with dynamic Mixture-of-Experts (MoE) management. The experiments address a fundamental question for advanced LLM development: how should an MoE system manage its expert pool when operating at full capacity under changing data distributions? We demonstrate that cost-penalized fitness metrics, combined with a linear grace period for newborn experts, produce a system that accumulates domain expertise through diversification rather than replacement. The central result is a round-trip domain shift experiment showing 9-11x faster recovery when returning to a previously learned domain, with zero expert births or replacements required. This "molecular memory" effect -- where dormant experts survive and reactivate when their domain returns -- has no analogue in current…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.