Faster Training of Very Deep Networks Via p-Norm Gates

Trang Pham; Truyen Tran; Dinh Phung; Svetha Venkatesh

arXiv:1608.03639·stat.ML·August 15, 2016

Faster Training of Very Deep Networks Via p-Norm Gates

Trang Pham, Truyen Tran, Dinh Phung, Svetha Venkatesh

PDF

TL;DR

This paper introduces a flexible p-norm gating scheme that enhances the learning speed of very deep neural networks by controlling information flow, subsuming existing gating methods and demonstrating significant improvements in training efficiency.

Contribution

The paper proposes a novel p-norm gating scheme that generalizes existing gating mechanisms and accelerates training of deep networks without additional computational overhead.

Findings

01

Improved training speed on large datasets

02

Unified framework for various gating schemes

03

No extra overhead in training process

Abstract

A major contributing factor to the recent advances in deep neural networks is structural units that let sensory information and gradients to propagate easily. Gating is one such structure that acts as a flow control. Gates are employed in many recent state-of-the-art recurrent models such as LSTM and GRU, and feedforward models such as Residual Nets and Highway Networks. This enables learning in very deep networks with hundred layers and helps achieve record-breaking results in vision (e.g., ImageNet with Residual Nets) and NLP (e.g., machine translation with GRU). However, there is limited work in analysing the role of gating in the learning process. In this paper, we propose a flexible $p$ -norm gating scheme, which allows user-controllable flow and as a consequence, improve the learning speed. This scheme subsumes other existing gating schemes, including those in GRU, Highway Networks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Highway networks · Sigmoid Activation · Tanh Activation · Gated Recurrent Unit · Long Short-Term Memory