Contrasting random and learned features in deep Bayesian linear regression
Jacob A. Zavatone-Veth, William L. Tong, and Cengiz Pehlevan

TL;DR
This paper compares the generalization behavior of deep Bayesian linear neural networks with random feature models, revealing how architecture influences overfitting, double descent phenomena, and optimal widths in simple deep regression models.
Contribution
It provides a detailed analysis of how feature learning impacts generalization in deep Bayesian linear models, highlighting differences between trained and random features.
Findings
Both models exhibit sample-wise double descent with label noise.
Random feature models show model-wise double descent with narrow bottlenecks.
Optimal widths for generalization depend on data density and model type.
Abstract
Understanding how feature learning affects generalization is among the foremost goals of modern deep learning theory. Here, we study how the ability to learn representations affects the generalization performance of a simple class of models: deep Bayesian linear neural networks trained on unstructured Gaussian data. By comparing deep random feature models to deep networks in which all layers are trained, we provide a detailed characterization of the interplay between width, depth, data density, and prior mismatch. We show that both models display sample-wise double-descent behavior in the presence of label noise. Random feature models can also display model-wise double-descent if there are narrow bottleneck layers, while deep networks do not show these divergences. Random feature models can have particular widths that are optimal for generalization at a given data density, while making…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
