Understanding training and generalization in deep learning by Fourier   analysis

Zhiqin John Xu

arXiv:1808.04295·cs.LG·November 30, 2018·42 cites

Understanding training and generalization in deep learning by Fourier analysis

Zhiqin John Xu

PDF

Open Access

TL;DR

This paper uses Fourier analysis to explain why deep neural networks prioritize low-frequency components during training and how small initialization improves generalization, supported by experiments on various datasets.

Contribution

It introduces a Fourier-based theoretical framework that explains the training dynamics and generalization properties of DNNs, emphasizing the role of initialization and frequency prioritization.

Findings

01

DNNs prioritize low-frequency components during training

02

Small initialization improves generalization while maintaining fitting ability

03

Experimental validation on natural images, 1D functions, and MNIST

Abstract

Background: It is still an open research area to theoretically understand why Deep Neural Networks (DNNs)---equipped with many more parameters than training data and trained by (stochastic) gradient-based methods---often achieve remarkably low generalization error. Contribution: We study DNN training by Fourier analysis. Our theoretical framework explains: i) DNN with (stochastic) gradient-based methods often endows low-frequency components of the target function with a higher priority during the training; ii) Small initialization leads to good generalization ability of DNN while preserving the DNN's ability to fit any function. These results are further confirmed by experiments of DNNs fitting the following datasets, that is, natural images, one-dimensional functions and MNIST dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Gaussian Processes and Bayesian Inference · Domain Adaptation and Few-Shot Learning