Valid Inference for Machine Learning Model Parameters
Neil Dey, Jonathan P. Williams

TL;DR
This paper develops a method to construct valid confidence sets for the optimal parameters of machine learning models using only training data, enabling reliable inference without population knowledge.
Contribution
It introduces a novel approach for valid inference on model parameters by constructing confidence sets that are data-driven and can be approximated with bootstrap methods.
Findings
Confidence sets accurately cover the true parameters
Bootstrap methods effectively approximate the distribution of confidence sets
Method ensures valid inference without population data
Abstract
The parameters of a machine learning model are typically learned by minimizing a loss function on a set of training data. However, this can come with the risk of overtraining; in order for the model to generalize well, it is of great importance that we are able to find the optimal parameter for the model on the entire population -- not only on the given training sample. In this paper, we construct valid confidence sets for this optimal parameter of a machine learning model, which can be generated using only the training data without any knowledge of the population. We then show that studying the distribution of this confidence set allows us to assign a notion of confidence to arbitrary regions of the parameter space, and we demonstrate that this distribution can be well-approximated using bootstrapping techniques.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Neural Networks and Applications
