The N-ary in the Coal Mine: Avoiding Mixture Model Failure with Proper Validation
Travis Maxfield, Joshua Hochuli, James Wellnitz, Cleber Melo-Filho,, Konstantin I. Popov, Eugene Muratov, and Alex Tropsha

TL;DR
This paper extends validation strategies for mixture modeling from binary to N-ary mixtures, emphasizing the importance of proper baseline performance measures to avoid overestimating model accuracy in complex mixture data.
Contribution
It introduces generalized validation strategies for N-ary mixture models and proposes a baseline performance method to improve model comparison accuracy.
Findings
Validation strategies are applicable to N-ary mixtures.
Baseline performance measures prevent overestimation of model accuracy.
Case studies demonstrate the effectiveness of proposed methods.
Abstract
Modeling the properties of chemical mixtures is a difficult but important part of any modeling process intended to be applicable to the often messy and impure phenomena of everyday life, including food and environmental safety, healthcare, etc. Part of this difficulty stems from the increased complexity of designing suitable model validation schemes for mixture data, a fact which has been elucidated in previous work only in the case of binary mixture models. We extend these previously defined validation strategies for QSAR modeling of binary mixtures to the more complex case of general, -ary mixtures and argue that these strategies are applicable to many modeling tasks beyond simple chemical mixtures. Additionally, we propose a method of establishing a baseline model performance for each mixture dataset to be in used in model selection comparisons. This baseline is intended to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReservoir Engineering and Simulation Methods · Fault Detection and Control Systems
