Why do Larger Models Generalize Better? A Theoretical Perspective via   the XOR Problem

Alon Brutzkus; Amir Globerson

arXiv:1810.03037·cs.LG·January 30, 2019·26 cites

Why do Larger Models Generalize Better? A Theoretical Perspective via the XOR Problem

Alon Brutzkus, Amir Globerson

PDF

Open Access

TL;DR

This paper provides a theoretical and empirical explanation for why over-parameterized neural networks, especially convolutional ones, tend to generalize better, focusing on weight clustering and feature exploration at initialization.

Contribution

It introduces a novel theoretical analysis extending the XOR problem to explain overparameterization benefits in convolutional networks, supported by empirical results on MNIST.

Findings

01

Overparameterized networks converge to better generalizing minima.

02

Weight clustering and feature exploration are key to improved generalization.

03

Empirical validation on MNIST supports the theoretical insights.

Abstract

Empirical evidence suggests that neural networks with ReLU activations generalize better with over-parameterization. However, there is currently no theoretical analysis that explains this observation. In this work, we provide theoretical and empirical evidence that, in certain cases, overparameterized convolutional networks generalize better than small networks because of an interplay between weight clustering and feature exploration at initialization. We demonstrate this theoretically for a 3-layer convolutional neural network with max-pooling, in a novel setting which extends the XOR problem. We show that this interplay implies that with overparamterization, gradient descent converges to global minima with better generalization performance compared to global minima of small networks. Empirically, we demonstrate these phenomena for a 3-layer convolutional neural network in the MNIST…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning

Methods*Communicated@Fast*How Do I Communicate to Expedia?