Overfitting and Generalizing with (PAC) Bayesian Prediction in Noisy Binary Classification

Xiaohan Zhu; Mesrob I. Ohannessian; Nathan Srebro

arXiv:2603.22644·stat.ML·March 25, 2026

Overfitting and Generalizing with (PAC) Bayesian Prediction in Noisy Binary Classification

Xiaohan Zhu, Mesrob I. Ohannessian, Nathan Srebro

PDF

Open Access

TL;DR

This paper analyzes a PAC-Bayes learning rule for noisy binary classification, showing how the choice of regularization parameter affects overfitting and generalization, and extending previous work to continuous priors and randomized predictions.

Contribution

It extends PAC-Bayes analysis to continuous priors and randomized predictors, providing a detailed characterization of regularization effects on overfitting and generalization.

Findings

01

Choosing a large regularization parameter ensures vanishing excess loss.

02

Over-regularization can lead to underfitting, while under-regularization causes overfitting.

03

The work generalizes previous discrete prior analysis to continuous priors.

Abstract

We consider a PAC-Bayes type learning rule for binary classification, balancing the training error of a randomized ''posterior'' predictor with its KL divergence to a pre-specified ''prior''. This can be seen as an extension of a modified two-part-code Minimum Description Length (MDL) learning rule, to continuous priors and randomized predictions. With a balancing parameter of $λ = 1$ this learning rule recovers an (empirical) Bayes posterior and a modified variant recovers the profile posterior, linking with standard Bayesian prediction (up to the treatment of the single-parameter noise level). However, from a risk-minimization prediction perspective, this Bayesian predictor overfits and can lead to non-vanishing excess loss in the agnostic case. Instead a choice of $λ ≫ 1$ , which can be seen as using a sample-size-dependent-prior, ensures uniformly vanishing excess loss…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Stochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning