A PAC-Bayesian Tutorial with A Dropout Bound

David McAllester

arXiv:1307.2118·cs.LG·July 9, 2013·70 cites

A PAC-Bayesian Tutorial with A Dropout Bound

David McAllester

PDF

Open Access

TL;DR

This tutorial reviews PAC-Bayesian theory, focusing on three bounds including an Occam bound, a PAC-Bayesian bound for posteriors, and a training-variance bound, with applications to dropout and regularization.

Contribution

It provides a comprehensive overview of PAC-Bayesian bounds, including a novel bound for dropout training and insights into variance reduction methods.

Findings

01

PAC-Bayesian bounds handle infinite precision parameters and dropout.

02

The training-variance bound offers a new perspective on bias-variance analysis.

03

Dropout training can be analyzed within the PAC-Bayesian framework.

Abstract

This tutorial gives a concise overview of existing PAC-Bayesian theory focusing on three generalization bounds. The first is an Occam bound which handles rules with finite precision parameters and which states that generalization loss is near training loss when the number of bits needed to write the rule is small compared to the sample size. The second is a PAC-Bayesian bound providing a generalization guarantee for posterior distributions rather than for individual rules. The PAC-Bayesian bound naturally handles infinite precision rule parameters, $L_{2}$ regularization, {\em provides a bound for dropout training}, and defines a natural notion of a single distinguished PAC-Bayesian posterior distribution. The third bound is a training-variance bound --- a kind of bias-variance analysis but with bias replaced by expected training loss. The training-variance bound dominates the other…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Neural Networks and Applications · Blind Source Separation Techniques