Reducing cross-sample prediction churn in scientific machine learning

Gordan Prastalo; Kevin Maik Jablonka

arXiv:2605.13826·cs.LG·May 14, 2026

Reducing cross-sample prediction churn in scientific machine learning

Gordan Prastalo, Kevin Maik Jablonka

PDF

TL;DR

This paper investigates the issue of cross-sample prediction churn in scientific machine learning, showing that data-side methods like bootstrap bagging and twin-bootstrap significantly reduce churn compared to standard parameter-side techniques.

Contribution

It introduces twin-bootstrap, a novel training method that reduces prediction churn, and demonstrates the effectiveness of data-side methods over standard techniques.

Findings

01

Bootstrap bagging reduces churn by 40-54% without accuracy loss.

02

Twin-bootstrap further reduces churn by median 45% beyond bagging-2.

03

Standard parameter-side methods do not reduce cross-sample prediction churn.

Abstract

Scientific machine learning reports predictive performance. It does not report whether the same prediction would survive a different draw of training data. Across $9$ chemistry benchmarks, two classifiers trained on independent bootstraps of the same training set agree on aggregate accuracy to within $1.3 - 4.2$ percentage points but disagree on the class label of $8.0 - 21.8%$ of test molecules. We call this gap \emph{cross-sample prediction churn}. The standard parameter-side techniques (deep ensembles, MC dropout, stochastic weight averaging) do not reduce this gap; two data-side methods do. The first is $K$ -bootstrap bagging, which cuts the rate $40 - 54%$ on every dataset at no accuracy cost ( $K \times$ -ERM compute). The second is \emph{twin-bootstrap}, our proposal: two networks trained jointly on independent bootstraps with a sym-KL consistency loss between…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.