SAFLEX: Self-Adaptive Augmentation via Feature Label Extrapolation

Mucong Ding; Bang An; Yuancheng Xu; Anirudh Satheesh; Furong Huang

arXiv:2410.02512·cs.LG·October 4, 2024

SAFLEX: Self-Adaptive Augmentation via Feature Label Extrapolation

Mucong Ding, Bang An, Yuancheng Xu, Anirudh Satheesh, Furong Huang

PDF

Open Access 3 Reviews

TL;DR

SAFLEX is an efficient, self-adaptive data augmentation method that learns to optimize augmented sample weights and labels, improving model robustness across diverse datasets and tasks with minimal additional computational cost.

Contribution

We introduce SAFLEX, a novel bilevel optimization-based approach that enhances existing augmentation pipelines by learning sample weights and labels, reducing noise and errors effectively.

Findings

01

Effective across natural, medical, and tabular data

02

Improves few-shot learning and out-of-distribution generalization

03

Seamlessly integrates with popular augmentation methods

Abstract

Data augmentation, a cornerstone technique in deep learning, is crucial in enhancing model performance, especially with scarce labeled data. While traditional techniques are effective, their reliance on hand-crafted methods limits their applicability across diverse data types and tasks. Although modern learnable augmentation methods offer increased adaptability, they are computationally expensive and challenging to incorporate within prevalent augmentation workflows. In this work, we present a novel, efficient method for data augmentation, effectively bridging the gap between existing augmentation strategies and emerging datasets and learning tasks. We introduce SAFLEX (Self-Adaptive Augmentation via Feature Label EXtrapolation), which learns the sample weights and soft labels of augmented samples provided by any given upstream augmentation pipeline, using a specifically designed…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 8· accept, good paperConfidence 3

Strengths

The paper is well-written and easy to understand. The diagrams and the equations are easy to follow. The experiments are performed on diverse datasets with various tasks, including medical imaging and tabular data. The results are highly encouraging.

Weaknesses

A few important previous works on sampling and purifying GAN synthetic data are relevant to this paper. It is important to acknowledge and discuss their contributions in the paper. Caramalau, Razvan, Binod Bhattarai, and Tae-Kyun Kim. "Sequential graph convolutional network for active learning." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021. Bhattarai, Binod, et al. "Sampling strategies for gan synthetic data." ICASSP 2020-2020 IEEE International Conf

Reviewer 02Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

1. The motivation in the paper about identifying the two issues with standard augmentation and then solving it by learning sample weights and soft-labels is really clear.

Weaknesses

1. The main issue is a lack of proper baselines. Papers such as [1] have already explored using soft labels for augmentations where the softness is derived on the basis of augmentation strength. This paper's novelty thus gets limited. There is no comparison with [1] in any of the experiments. The authors should do a proper comparison with [1] and justify how their approach is better than it. 2. To solidify the experimental results the authors should also experiment with stronger architectures

Reviewer 03Rating 8· accept, good paperConfidence 4

Strengths

They have considered experiments of different data types and model training as downstream tasks, which demonstrate their workflow as a robust one.

Weaknesses

From a model perspective, this is a good one as topic of adaptive learning, though a little bit off the topic of this conference. From data augmentation perspective, it is better to demo some more experiments in downstream task involves with high dimensional data.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Text and Document Classification Technologies · Anomaly Detection Techniques and Applications

MethodsCutMix · Diffusion