Regularising Deep Networks with Deep Generative Models

Matthew Willetts; Alexander Camuto; Stephen Roberts; Chris Holmes

arXiv:1909.11507·cs.LG·October 14, 2019

Regularising Deep Networks with Deep Generative Models

Matthew Willetts, Alexander Camuto, Stephen Roberts, Chris Holmes

PDF

Open Access

TL;DR

This paper introduces a novel regularisation technique for neural networks that models activation distributions and imputes values during training, improving accuracy and uncertainty calibration on image classification tasks.

Contribution

It generalizes data augmentation to hidden layers using deep generative models, enhancing regularisation and model calibration.

Findings

01

Higher test accuracy on CIFAR-10 and SVHN

02

Lower test-set cross-entropy compared to baselines

03

Better calibrated uncertainty over class posteriors

Abstract

We develop a new method for regularising neural networks. We learn a probability distribution over the activations of all layers of the model and then insert imputed values into the network during training. We obtain a posterior for an arbitrary subset of activations conditioned on the remainder. This is a generalisation of data augmentation to the hidden layers of a network, and a form of data-aware dropout. We demonstrate that our training method leads to higher test accuracy and lower test-set cross-entropy for neural networks trained on CIFAR-10 and SVHN compared to standard regularisation baselines: our approach leads to networks with better calibrated uncertainty over the class posteriors all the while delivering greater test-set accuracy.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Gaussian Processes and Bayesian Inference · Domain Adaptation and Few-Shot Learning