Faster Training of Very Deep Networks Via p-Norm Gates
Trang Pham, Truyen Tran, Dinh Phung, Svetha Venkatesh

TL;DR
This paper introduces a flexible p-norm gating scheme that enhances the learning speed of very deep neural networks by controlling information flow, subsuming existing gating methods and demonstrating significant improvements in training efficiency.
Contribution
The paper proposes a novel p-norm gating scheme that generalizes existing gating mechanisms and accelerates training of deep networks without additional computational overhead.
Findings
Improved training speed on large datasets
Unified framework for various gating schemes
No extra overhead in training process
Abstract
A major contributing factor to the recent advances in deep neural networks is structural units that let sensory information and gradients to propagate easily. Gating is one such structure that acts as a flow control. Gates are employed in many recent state-of-the-art recurrent models such as LSTM and GRU, and feedforward models such as Residual Nets and Highway Networks. This enables learning in very deep networks with hundred layers and helps achieve record-breaking results in vision (e.g., ImageNet with Residual Nets) and NLP (e.g., machine translation with GRU). However, there is limited work in analysing the role of gating in the learning process. In this paper, we propose a flexible -norm gating scheme, which allows user-controllable flow and as a consequence, improve the learning speed. This scheme subsumes other existing gating schemes, including those in GRU, Highway Networks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Highway networks · Sigmoid Activation · Tanh Activation · Gated Recurrent Unit · Long Short-Term Memory
