Learning to Compose Domain-Specific Transformations for Data Augmentation
Alexander J. Ratner, Henry R. Ehrenberg, Zeshan Hussain, Jared, Dunnmon, Christopher R\'e

TL;DR
This paper introduces a method to automatically learn and compose data transformations for augmentation using a generative adversarial approach, improving model performance across image and text tasks.
Contribution
It presents a novel approach to automate the construction of complex data augmentation transformations using a generative sequence model trained adversarially.
Findings
Achieved 4.0 accuracy point improvement on CIFAR-10
Gained 1.4 F1 point improvement on ACE relation extraction
Improved 3.4 accuracy points with domain-specific transformations on medical imaging
Abstract
Data augmentation is a ubiquitous technique for increasing the size of labeled training sets by leveraging task-specific data transformations that preserve class labels. While it is often easy for domain experts to specify individual transformations, constructing and tuning the more sophisticated compositions typically needed to achieve state-of-the-art results is a time-consuming manual task in practice. We propose a method for automating this process by learning a generative sequence model over user-specified transformation functions using a generative adversarial approach. Our method can make use of arbitrary, non-deterministic transformation functions, is robust to misspecified user input, and is trained on unlabeled data. The learned transformation model can then be used to perform data augmentation for any end discriminative model. In our experiments, we show the efficacy of our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and Data Classification · Web Data Mining and Analysis
