# Input Fast-Forwarding for Better Deep Learning

**Authors:** Ahmed Ibrahim, A. Lynn Abbott, Mohamed E. Hussein

arXiv: 1705.08479 · 2017-05-25

## TL;DR

This paper proposes input fast-forwarding, a new neural network architecture that improves training efficiency and performance by adding parallel input paths, reducing vanishing gradients, and enabling better feature integration.

## Contribution

The paper introduces input fast-forwarding as a novel architectural scheme that enhances deep network training and performance, distinct from deep supervision techniques.

## Key findings

- FFNet with fast-forwarding outperforms larger models like GoogLeNet and CaffeNet.
- Fast-forwarding reduces vanishing gradient problems in deep networks.
- Empirical results show improved learning capacity due to the proposed architecture.

## Abstract

This paper introduces a new architectural framework, known as input fast-forwarding, that can enhance the performance of deep networks. The main idea is to incorporate a parallel path that sends representations of input values forward to deeper network layers. This scheme is substantially different from "deep supervision" in which the loss layer is re-introduced to earlier layers. The parallel path provided by fast-forwarding enhances the training process in two ways. First, it enables the individual layers to combine higher-level information (from the standard processing path) with lower-level information (from the fast-forward path). Second, this new architecture reduces the problem of vanishing gradients substantially because the fast-forwarding path provides a shorter route for gradient backpropagation. In order to evaluate the utility of the proposed technique, a Fast-Forward Network (FFNet), with 20 convolutional layers along with parallel fast-forward paths, has been created and tested. The paper presents empirical results that demonstrate improved learning capacity of FFNet due to fast-forwarding, as compared to GoogLeNet (with deep supervision) and CaffeNet, which are 4x and 18x larger in size, respectively. All of the source code and deep learning models described in this paper will be made available to the entire research community

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1705.08479/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1705.08479/full.md

## References

20 references — full list in the complete paper: https://tomesphere.com/paper/1705.08479/full.md

---
Source: https://tomesphere.com/paper/1705.08479