Fast learning rate of deep learning via a kernel perspective

Taiji Suzuki

arXiv:1705.10182·math.ST·May 31, 2017·5 cites

Fast learning rate of deep learning via a kernel perspective

Taiji Suzuki

PDF

Open Access

TL;DR

This paper introduces a kernel-based theoretical framework for analyzing deep learning's generalization error, deriving a faster learning rate and optimal layer width for empirical risk minimization and Bayesian deep learning.

Contribution

It develops an infinite dimensional kernel perspective to analyze deep neural networks, providing new insights into their generalization and convergence rates.

Findings

01

Faster than O(1/√n) convergence rate is achievable.

02

Optimal internal layer width can be determined via the kernel's degree of freedom.

03

A new theoretical framework unifies deep learning analysis with kernel methods.

Abstract

We develop a new theoretical framework to analyze the generalization error of deep learning, and derive a new fast learning rate for two representative algorithms: empirical risk minimization and Bayesian deep learning. The series of theoretical analyses of deep learning has revealed its high expressive power and universal approximation capability. Although these analyses are highly nonparametric, existing generalization error analyses have been developed mainly in a fixed dimensional parametric model. To compensate this gap, we develop an infinite dimensional model that is based on an integral form as performed in the analysis of the universal approximation capability. This allows us to define a reproducing kernel Hilbert space corresponding to each layer. Our point of view is to deal with the ordinary finite dimensional deep neural network as a finite approximation of the infinite…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaussian Processes and Bayesian Inference · Model Reduction and Neural Networks · Tensor decomposition and applications