On Exact Computation with an Infinitely Wide Neural Net

Sanjeev Arora; Simon S. Du; Wei Hu; Zhiyuan Li; Ruslan Salakhutdinov,; Ruosong Wang

arXiv:1904.11955·cs.LG·November 5, 2019·61 cites

On Exact Computation with an Infinitely Wide Neural Net

Sanjeev Arora, Simon S. Du, Wei Hu, Zhiyuan Li, Ruslan Salakhutdinov,, Ruosong Wang

PDF

Open Access 2 Repos

TL;DR

This paper introduces an efficient exact algorithm and GPU implementation for the Convolutional Neural Tangent Kernel (CNTK), enabling kernel-based classification of infinite-width CNNs with performance close to finite deep nets on CIFAR-10.

Contribution

It provides the first efficient exact computation method for CNTK and demonstrates its high performance as a kernel method, bridging deep learning and kernel theory.

Findings

01

CNTK achieves 10% higher accuracy than previous kernel methods on CIFAR-10.

02

Performance of CNTK is within 6% of finite deep CNNs without batch normalization.

03

The paper offers the first non-asymptotic proof linking wide neural nets to NTK-based kernel regression.

Abstract

How well does a classic deep net architecture like AlexNet or VGG19 classify on a standard dataset such as CIFAR-10 when its width --- namely, number of channels in convolutional layers, and number of nodes in fully-connected internal layers --- is allowed to increase to infinity? Such questions have come to the forefront in the quest to theoretically understand deep learning and its mysteries about optimization and generalization. They also connect deep learning to notions such as Gaussian processes and kernels. A recent paper [Jacot et al., 2018] introduced the Neural Tangent Kernel (NTK) which captures the behavior of fully-connected deep nets in the infinite width limit trained by gradient descent; this object was implicit in some other recent papers. An attraction of such ideas is that a pure kernel-based method is used to capture the power of a fully-trained deep net of infinite…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaussian Processes and Bayesian Inference · Stochastic Gradient Optimization Techniques · Advanced Neural Network Applications

MethodsNeural Tangent Kernel · 1x1 Convolution · Convolution · Local Response Normalization · Grouped Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Dense Connections · Max Pooling · Softmax