Shift is Good: Mismatched Data Mixing Improves Test Performance

Marko Medvedev; Kaifeng Lyu; Zhiyuan Li; Nathan Srebro

arXiv:2510.25108·cs.LG·November 11, 2025

Shift is Good: Mismatched Data Mixing Improves Test Performance

Marko Medvedev, Kaifeng Lyu, Zhiyuan Li, Nathan Srebro

PDF

TL;DR

This paper demonstrates that intentionally mismatching training and testing data proportions across components can enhance test performance, revealing that distribution shift can sometimes be beneficial even without transfer between unrelated components.

Contribution

It provides a theoretical analysis showing how mismatched data mixing can improve test outcomes and identifies optimal training proportions in various scenarios.

Findings

01

Distribution shift can be beneficial for test performance.

02

Optimal training proportions depend on the scenario.

03

Mismatched data mixing applies to compositional skill distributions.

Abstract

We consider training and testing on mixture distributions with different training and test proportions. We show that in many settings, and in some sense generically, distribution shift can be beneficial, and test performance can improve due to mismatched training proportions, even if the components are unrelated and with no transfer between components. In a variety of scenarios, we identify the optimal training proportions and the extent to which such distribution shift can be beneficial. We show how the same analysis applies also to a compositional setting with differing distribution of component "skills'' at training and test.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.