Understanding data augmentation for classification: when to warp?

Sebastien C. Wong; Adam Gatt; Victor Stamatescu; Mark D. McDonnell

arXiv:1609.08764·cs.CV·November 29, 2016

Understanding data augmentation for classification: when to warp?

Sebastien C. Wong, Adam Gatt, Victor Stamatescu, Mark D. McDonnell

PDF

TL;DR

This paper compares data augmentation techniques in data-space and feature-space for image classification, finding data-space augmentation more effective when plausible transformations are known.

Contribution

It provides an empirical evaluation of data warping versus synthetic over-sampling for augmenting training data in neural networks and SVMs.

Findings

01

Data-space augmentation yields better performance when plausible transformations are available.

02

Augmentation reduces overfitting across different classifiers.

03

Feature-space augmentation is less effective without suitable transformations.

Abstract

In this paper we investigate the benefit of augmenting data with synthetically created samples when training a machine learning classifier. Two approaches for creating additional training samples are data warping, which generates additional samples through transformations applied in the data-space, and synthetic over-sampling, which creates additional samples in feature-space. We experimentally evaluate the benefits of data augmentation for a convolutional backpropagation-trained neural network, a convolutional support vector machine and a convolutional extreme learning machine classifier, using the standard MNIST handwritten digit dataset. We found that while it is possible to perform generic augmentation in feature-space, if plausible transforms for the data are known then augmentation in data-space provides a greater benefit for improving performance and reducing overfitting.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.