Information-Theoretic Greedy Layer-wise Training for Traffic Sign Recognition
Shuyan Lyu, Zhanzimo Wu, Junliang Du

TL;DR
This paper introduces a novel information-theoretic layer-wise training method for deep CNNs, improving training efficiency and performance on traffic sign recognition without backpropagation.
Contribution
It proposes a new layer-wise training approach based on the deterministic information bottleneck and Rényi entropy, validated on CIFAR datasets and traffic sign recognition.
Findings
Layer-wise training converges from bottom to top following an information bottleneck.
The proposed method outperforms existing layer-wise approaches.
Achieves comparable performance to standard SGD training.
Abstract
Modern deep neural networks (DNNs) are typically trained with a global cross-entropy loss in a supervised end-to-end manner: neurons need to store their outgoing weights; training alternates between a forward pass (computation) and a top-down backward pass (learning) which is biologically implausible. Alternatively, greedy layer-wise training eliminates the need for cross-entropy loss and backpropagation. By avoiding the computation of intermediate gradients and the storage of intermediate outputs, it reduces memory usage and helps mitigate issues such as vanishing or exploding gradients. However, most existing layer-wise training approaches have been evaluated only on relatively small datasets with simple deep architectures. In this paper, we first systematically analyze the training dynamics of popular convolutional neural networks (CNNs) trained by stochastic gradient descent (SGD)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis
