Valid Inference for Machine Learning Model Parameters

Neil Dey; Jonathan P. Williams

arXiv:2302.10840·stat.ML·May 13, 2024

Valid Inference for Machine Learning Model Parameters

Neil Dey, Jonathan P. Williams

PDF

Open Access 1 Repo

TL;DR

This paper develops a method to construct valid confidence sets for the optimal parameters of machine learning models using only training data, enabling reliable inference without population knowledge.

Contribution

It introduces a novel approach for valid inference on model parameters by constructing confidence sets that are data-driven and can be approximated with bootstrap methods.

Findings

01

Confidence sets accurately cover the true parameters

02

Bootstrap methods effectively approximate the distribution of confidence sets

03

Method ensures valid inference without population data

Abstract

The parameters of a machine learning model are typically learned by minimizing a loss function on a set of training data. However, this can come with the risk of overtraining; in order for the model to generalize well, it is of great importance that we are able to find the optimal parameter for the model on the entire population -- not only on the given training sample. In this paper, we construct valid confidence sets for this optimal parameter of a machine learning model, which can be generated using only the training data without any knowledge of the population. We then show that studying the distribution of this confidence set allows us to assign a notion of confidence to arbitrary regions of the parameter space, and we demonstrate that this distribution can be well-approximated using bootstrapping techniques.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

neil-dey/valid-inference-ml-estimators
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Neural Networks and Applications