On minimizing the training set fill distance in machine learning regression
Paolo Climaco, Jochen Garcke

TL;DR
This paper investigates how selecting small, well-spaced training sets using Farthest Point Sampling (FPS) can reduce prediction errors and improve stability in regression models, especially under data and computational constraints.
Contribution
It introduces an error bound related to fill distance and demonstrates FPS's effectiveness in minimizing prediction error and enhancing model stability in regression tasks.
Findings
Minimizing fill distance reduces maximum prediction error.
FPS outperforms other sampling methods significantly.
Training set selection with FPS improves model stability.
Abstract
For regression tasks one often leverages large datasets for training predictive machine learning models. However, using large datasets may not be feasible due to computational limitations or high data labelling costs. Therefore, suitably selecting small training sets from large pools of unlabelled data points is essential to maximize model performance while maintaining efficiency. In this work, we study Farthest Point Sampling (FPS), a data selection approach that aims to minimize the fill distance of the selected set. We derive an upper bound for the maximum expected prediction error, conditional to the location of the unlabelled data points, that linearly depends on the training set fill distance. For empirical validation, we perform experiments using two regression models on three datasets. We empirically show that selecting a training set by aiming to minimize the fill distance,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Machine Learning and Algorithms · Face and Expression Recognition
