Reducing cross-sample prediction churn in scientific machine learning
Gordan Prastalo, Kevin Maik Jablonka

TL;DR
This paper investigates the issue of cross-sample prediction churn in scientific machine learning, showing that data-side methods like bootstrap bagging and twin-bootstrap significantly reduce churn compared to standard parameter-side techniques.
Contribution
It introduces twin-bootstrap, a novel training method that reduces prediction churn, and demonstrates the effectiveness of data-side methods over standard techniques.
Findings
Bootstrap bagging reduces churn by 40-54% without accuracy loss.
Twin-bootstrap further reduces churn by median 45% beyond bagging-2.
Standard parameter-side methods do not reduce cross-sample prediction churn.
Abstract
Scientific machine learning reports predictive performance. It does not report whether the same prediction would survive a different draw of training data. Across chemistry benchmarks, two classifiers trained on independent bootstraps of the same training set agree on aggregate accuracy to within percentage points but disagree on the class label of of test molecules. We call this gap \emph{cross-sample prediction churn}. The standard parameter-side techniques (deep ensembles, MC dropout, stochastic weight averaging) do not reduce this gap; two data-side methods do. The first is -bootstrap bagging, which cuts the rate on every dataset at no accuracy cost (-ERM compute). The second is \emph{twin-bootstrap}, our proposal: two networks trained jointly on independent bootstraps with a sym-KL consistency loss between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
