Parametrising the Inhomogeneity Inducing Capacity of a Training Set, and its Impact on Supervised Learning
Gargi Roy, Dalia Chakrabarty

TL;DR
This paper introduces an inhomogeneity parameter for training datasets that quantifies their correlation structure, impacting the choice of non-stationary models in Gaussian Process learning and affecting prediction quality.
Contribution
It defines and computes the inhomogeneity parameter for datasets, linking it to the necessity of non-stationary models in Gaussian Process learning.
Findings
Inhomogeneity parameter can be computed for various datasets.
Non-zero inhomogeneity requires non-stationary modeling.
Prediction quality is influenced by the dataset's inhomogeneity.
Abstract
We introduce parametrisation of that property of the available training dataset, that necessitates an inhomogeneous correlation structure for the function that is learnt as a model of the relationship between the pair of variables, observations of which comprise the considered training data. We refer to a parametrisation of this property of a given training set, as its ``inhomogeneity parameter''. It is easy to compute this parameter for small-to-large datasets, and we demonstrate such computation on multiple publicly-available datasets, while also demonstrating that conventional ``non-stationarity'' of data does not imply a non-zero inhomogeneity parameter of the dataset. We prove that - within the probabilistic Gaussian Process-based learning approach - a training set with a non-zero inhomogeneity parameter renders it imperative, that the process that is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Bayesian Modeling and Causal Inference · Bayesian Methods and Mixture Models
