Techniques for Learning Binary Stochastic Feedforward Neural Networks
Tapani Raiko, Mathias Berglund, Guillaume Alain, Laurent Dinh

TL;DR
This paper explores training techniques for stochastic binary neural networks, proposing new estimators that improve training effectiveness and benchmarking their performance against existing methods.
Contribution
It introduces two novel gradient estimators for training stochastic binary networks and provides benchmark tests comparing their performance to existing estimators.
Findings
Proposed estimators outperform existing methods in training stochastic networks.
Training with a single sample (M=1) behaves differently, avoiding stochasticity.
Benchmark tests demonstrate the effectiveness of the new estimators.
Abstract
Stochastic binary hidden units in a multi-layer perceptron (MLP) network give at least three potential benefits when compared to deterministic MLP networks. (1) They allow to learn one-to-many type of mappings. (2) They can be used in structured prediction problems, where modeling the internal structure of the output is important. (3) Stochasticity has been shown to be an excellent regularizer, which makes generalization performance potentially better in general. However, training stochastic networks is considerably more difficult. We study training using M samples of hidden activations per input. We show that the case M=1 leads to a fundamentally different behavior where the network tries to avoid stochasticity. We propose two new estimators for the training gradient and propose benchmark tests for comparing training algorithms. Our experiments confirm that training stochastic networks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Gaussian Processes and Bayesian Inference · Anomaly Detection Techniques and Applications
