Learning Sparse Structured Ensembles with SG-MCMC and Network Pruning

Yichi Zhang; Zhijian Ou

arXiv:1803.00184·stat.ML·May 24, 2018

Learning Sparse Structured Ensembles with SG-MCMC and Network Pruning

Yichi Zhang, Zhijian Ou

PDF

Open Access

TL;DR

This paper introduces a two-stage method combining SG-MCMC, group sparse priors, and pruning to efficiently learn neural network ensembles with high accuracy and reduced computational costs.

Contribution

It is the first to integrate SG-MCMC, group sparse priors, and pruning for neural network ensemble learning, improving efficiency and accuracy.

Findings

01

Achieved 21% reduction in language model perplexity.

02

Reduced model parameters to 30% of the original.

03

Lowered computation costs by 70% in ensemble models.

Abstract

An ensemble of neural networks is known to be more robust and accurate than an individual network, however usually with linearly-increased cost in both training and testing. In this work, we propose a two-stage method to learn Sparse Structured Ensembles (SSEs) for neural networks. In the first stage, we run SG-MCMC with group sparse priors to draw an ensemble of samples from the posterior distribution of network parameters. In the second stage, we apply weight-pruning to each sampled network and then perform retraining over the remained connections. In this way of learning SSEs with SG-MCMC and pruning, we not only achieve high prediction accuracy since SG-MCMC enhances exploration of the model-parameter space, but also reduce memory and computation cost significantly in both training and testing of NN ensembles. This is thoroughly evaluated in the experiments of learning SSE ensembles…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Domain Adaptation and Few-Shot Learning · Neural Networks and Applications

MethodsPruning · Sigmoid Activation · Tanh Activation · Long Short-Term Memory