Information FOMO: The unhealthy fear of missing out on information. A method for removing misleading data for healthier models
Ethan Pickering, Themistoklis P. Sapsis

TL;DR
This paper introduces a Bayesian sequential data selection method that filters out misleading information, enhancing model accuracy and stability without traditional data partitioning, applicable to Gaussian processes and neural networks.
Contribution
The paper presents a novel Bayesian data selection approach that dynamically couples data with the model, improving error convergence and eliminating the need for separate validation sets.
Findings
Reduces sample-wise error and prevents performance degradation.
Eliminates double descent phenomena in surrogate models.
Applicable to both Gaussian process and neural network models.
Abstract
Misleading or unnecessary data can have out-sized impacts on the health or accuracy of Machine Learning (ML) models. We present a Bayesian sequential selection method, akin to Bayesian experimental design, that identifies critically important information within a dataset, while ignoring data that is either misleading or brings unnecessary complexity to the surrogate model of choice. Our method improves sample-wise error convergence and eliminates instances where more data leads to worse performance and instabilities of the surrogate model, often termed sample-wise ``double descent''. We find these instabilities are a result of the complexity of the underlying map and linked to extreme events and heavy tails. Our approach has two key features. First, the selection algorithm dynamically couples the chosen model and data. Data is chosen based on its merits towards improving the selected…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHealth, Environment, Cognitive Aging · Explainable Artificial Intelligence (XAI) · Machine Learning in Healthcare
