AutoMixer: Checkpoint Artifacts as Automatic Data Mixers

Ernie Chang; Yang Li; Patrick Huber; Vish Vogeti; David Kant; Yangyang Shi; Vikas Chandra

arXiv:2506.21910·cs.CL·February 10, 2026

AutoMixer: Checkpoint Artifacts as Automatic Data Mixers

Ernie Chang, Yang Li, Patrick Huber, Vish Vogeti, David Kant, Yangyang Shi, Vikas Chandra

PDF

Open Access 1 Video

TL;DR

AutoMixer leverages checkpoint models as sources of diverse data signals, using their capabilities to improve language model training and achieve up to 1.93% performance gains on reasoning benchmarks.

Contribution

The paper introduces a novel method to utilize checkpoint models as automatic data mixers based on their emerging capabilities during training.

Findings

01

Significant performance improvements on eight reasoning benchmarks

02

Checkpoint models can be effectively used as data sources for training

03

Up to 1.93% performance gain in pretraining setting

Abstract

In language model training, it is desirable to equip models with capabilities from various tasks. However, it is not clear how to directly obtain the right data mixtures for these capabilities as the relationship between data and tasks is difficult to be modeled. In this work, we observe that checkpoint models exhibit emerging capabilities at different points in the training trajectory. Often, the training process saves checkpoints as artifacts that are under-utilized as a source of in-training data signals. We identify these artifact models based on their respective capabilities on the benchmarks and leverage them as data mixers by using their aggregated first-order influence approximation over source data. We demonstrated on eight reasoning benchmarks that the proposed framework shows significant improvements in the pretraining setting, with performance improvements of up to 1.93%.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

AutoMixer: Checkpoint Artifacts as Automatic Data Mixers· underline

Taxonomy

TopicsTopic Modeling · Machine Learning and Data Classification · Domain Adaptation and Few-Shot Learning