A PAC-Bayes oracle inequality for sparse neural networks

Maximilian F. Steffen; Mathias Trabs

arXiv:2204.12392·math.ST·January 9, 2026

A PAC-Bayes oracle inequality for sparse neural networks

Maximilian F. Steffen, Mathias Trabs

PDF

TL;DR

This paper establishes a PAC-Bayes oracle inequality for sparse neural networks, demonstrating that the Gibbs posterior adapts to unknown function regularity and achieves near-optimal convergence rates in nonparametric regression.

Contribution

It introduces a new oracle inequality for sparse neural networks using PAC-Bayes theory, showing adaptive minimax-optimal convergence rates.

Findings

01

The Gibbs posterior can be efficiently sampled using Langevin algorithms.

02

The method adapts to unknown regularity and hierarchical structure of the target function.

03

Achieves near-minimax optimal convergence rates up to a logarithmic factor.

Abstract

We study the Gibbs posterior distribution for sparse deep neural nets in a nonparametric regression setting. The posterior can be accessed via Metropolis-adjusted Langevin algorithms. Using a mixture over uniform priors on sparse sets of network weights, we prove an oracle inequality which shows that the method adapts to the unknown regularity and hierarchical structure of the regression function. The estimator achieves the minimax-optimal rate of convergence (up to a logarithmic factor).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.