Understanding Weight Normalized Deep Neural Networks with Rectified   Linear Units

Yixi Xu; Xiao Wang

arXiv:1810.01877·cs.LG·November 29, 2018·6 cites

Understanding Weight Normalized Deep Neural Networks with Rectified Linear Units

Yixi Xu, Xiao Wang

PDF

Open Access

TL;DR

This paper develops a theoretical framework for understanding the capacity and approximation properties of weight normalized deep neural networks with ReLU activations, focusing on $L_{p,q}$ normalization and its implications for generalization.

Contribution

It introduces a norm-based capacity control framework for $L_{p,q}$ weight normalized networks and analyzes their approximation and generalization properties.

Findings

01

Capacity bounds depend on network depth and normalization parameters.

02

Approximation error is controlled by the output layer norm.

03

Generalization error scales with the square root of the network depth.

Abstract

This paper presents a general framework for norm-based capacity control for $L_{p, q}$ weight normalized deep neural networks. We establish the upper bound on the Rademacher complexities of this family. With an $L_{p, q}$ normalization where $q \leq p^{*}$ , and $1/ p + 1/ p^{*} = 1$ , we discuss properties of a width-independent capacity control, which only depends on depth by a square root term. We further analyze the approximation properties of $L_{p, q}$ weight normalized deep neural networks. In particular, for an $L_{1, \infty}$ weight normalized network, the approximation error can be controlled by the $L_{1}$ norm of the output layer, and the corresponding generalization error only depends on the architecture by the square root of the depth.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Neural Networks and Applications · Machine Learning and ELM