Information-Corrected Estimation: A Generalization Error Reducing   Parameter Estimation Method

Matthew Dixon; Tyler Ward

arXiv:1803.04947·stat.CO·November 4, 2021·Entropy

Information-Corrected Estimation: A Generalization Error Reducing Parameter Estimation Method

Matthew Dixon, Tyler Ward

PDF

TL;DR

This paper introduces Information-Corrected Estimation (ICE), a novel objective function that directly minimizes generalization error in supervised learning, outperforming traditional methods like maximum likelihood and L2 regularization.

Contribution

The paper proposes ICE, a new estimation method that reduces generalization error by directly optimizing a corrected likelihood related to KL divergence, with theoretical and experimental validation.

Findings

01

ICE significantly reduces generalization error in experiments.

02

ICE outperforms maximum likelihood and L2 regularization.

03

Theoretically effective for a wide class of models.

Abstract

Modern computational models in supervised machine learning are often highly parameterized universal approximators. As such, the value of the parameters is unimportant, and only the out of sample performance is considered. On the other hand much of the literature on model estimation assumes that the parameters themselves have intrinsic value, and thus is concerned with bias and variance of parameter estimates, which may not have any simple relationship to out of sample model performance. Therefore, within supervised machine learning, heavy use is made of ridge regression (i.e., L2 regularization), which requires the the estimation of hyperparameters and can be rendered ineffective by certain model parameterizations. We introduce an objective function which we refer to as Information-Corrected Estimation (ICE) that reduces KL divergence based generalization error for supervised machine…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.