An Information-Theoretic Perspective on Overfitting and Underfitting

Daniel Bashir; George D. Montanez; Sonia Sehra; Pedro Sandoval Segura,; Julius Lauw

arXiv:2010.06076·cs.LG·November 10, 2020

An Information-Theoretic Perspective on Overfitting and Underfitting

Daniel Bashir, George D. Montanez, Sonia Sehra, Pedro Sandoval Segura,, Julius Lauw

PDF

TL;DR

This paper introduces an information-theoretic framework to analyze overfitting and underfitting in machine learning, proving the undecidability of predicting overfitting for arbitrary algorithms and relating capacity measures to generalization.

Contribution

It formalizes the concept of algorithm capacity using information transfer and establishes bounds and relationships with existing theoretical frameworks.

Findings

01

Proves the undecidability of overfitting prediction.

02

Provides upper bounds on algorithm capacity.

03

Links capacity measures to generalization theory.

Abstract

We present an information-theoretic framework for understanding overfitting and underfitting in machine learning and prove the formal undecidability of determining whether an arbitrary classification algorithm will overfit a dataset. Measuring algorithm capacity via the information transferred from datasets to models, we consider mismatches between algorithm capacities and datasets to provide a signature for when a model can overfit or underfit a dataset. We present results upper-bounding algorithm capacity, establish its relationship to quantities in the algorithmic search framework for machine learning, and relate our work to recent information-theoretic approaches to generalization.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.