Darwin Family: MRI-Trust-Weighted Evolutionary Merging for Training-Free Scaling of Language-Model Reasoning

Taebong Kim; Youngsik Hong; Minsik Kim; Sunyoung Choi; Jaewon Jang; Junghoon Shin; Minseo Kim

arXiv:2605.14386·cs.NE·May 15, 2026

Darwin Family: MRI-Trust-Weighted Evolutionary Merging for Training-Free Scaling of Language-Model Reasoning

Taebong Kim, Youngsik Hong, Minsik Kim, Sunyoung Choi, Jaewon Jang, Junghoon Shin, Minseo Kim

PDF

21 Models 1 Datasets

TL;DR

Darwin Family introduces a training-free evolutionary merging framework for large language models that improves reasoning performance by reorganizing existing checkpoints without additional training.

Contribution

It proposes a novel gradient-free weight-space recombination method with adaptive merging, trust balancing, and cross-architecture breeding, enabling scalable, training-free model evolution.

Findings

01

Darwin-27B-Opus achieves 86.9% on GPQA Diamond, ranking #6 among 1,252 models.

02

Darwin models outperform their parent models without gradient-based training.

03

Supports recursive multi-generation evolution across different model architectures.

Abstract

We present Darwin Family, a framework for training-free evolutionary merging of large language models via gradient-free weight-space recombination. We ask whether frontier-level reasoning performance can be improved without additional training, by reorganizing latent capabilities already encoded in existing checkpoints. Darwin introduces three key ideas: (i) a 14-dimensional adaptive merge genome enabling fine-grained component- and block-level recombination; (ii) MRI-Trust Fusion, which adaptively balances diagnostic layer-importance signals with evolutionary search through a learnable trust parameter; and (iii) an Architecture Mapper that enables cross-architecture breeding between heterogeneous model families. Empirically, the flagship Darwin-27B-Opus achieves 86.9% on GPQA Diamond, ranking #6 among 1,252 evaluated models, and outperforming its fully trained foundation model without…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

FINAL-Bench/World-Model
dataset· 772 dl
772 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.