Optimal Data Split Methodology for Model Validation
Rebecca Morrison, Corey Bryant, Gabriel Terejanu, Kenji Miki, Serge, Prudhomme

TL;DR
This paper introduces an algorithm for optimally partitioning data into calibration and validation sets to improve model validation, considering model predictions, data reproduction, and challenge level, demonstrated on a shock-tube experiment.
Contribution
The paper presents a systematic algorithm for data splitting that balances model evaluation and challenge, applicable across diverse scientific modeling contexts.
Findings
Algorithm effectively identifies optimal data partitions.
Framework improves model validation robustness.
Demonstrated on ICCD camera data from shock-tube experiments.
Abstract
The decision to incorporate cross-validation into validation processes of mathematical models raises an immediate question - how should one partition the data into calibration and validation sets? We answer this question systematically: we present an algorithm to find the optimal partition of the data subject to certain constraints. While doing this, we address two critical issues: 1) that the model be evaluated with respect to predictions of a given quantity of interest and its ability to reproduce the data, and 2) that the model be highly challenged by the validation set, assuming it is properly informed by the calibration set. This framework also relies on the interaction between the experimentalist and/or modeler, who understand the physical system and the limitations of the model; the decision-maker, who understands and can quantify the cost of model failure; and the computational…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProbabilistic and Robust Engineering Design · Gaussian Processes and Bayesian Inference · Statistical Methods and Inference
