SAFLEX: Self-Adaptive Augmentation via Feature Label Extrapolation
Mucong Ding, Bang An, Yuancheng Xu, Anirudh Satheesh, Furong Huang

TL;DR
SAFLEX is an efficient, self-adaptive data augmentation method that learns to optimize augmented sample weights and labels, improving model robustness across diverse datasets and tasks with minimal additional computational cost.
Contribution
We introduce SAFLEX, a novel bilevel optimization-based approach that enhances existing augmentation pipelines by learning sample weights and labels, reducing noise and errors effectively.
Findings
Effective across natural, medical, and tabular data
Improves few-shot learning and out-of-distribution generalization
Seamlessly integrates with popular augmentation methods
Abstract
Data augmentation, a cornerstone technique in deep learning, is crucial in enhancing model performance, especially with scarce labeled data. While traditional techniques are effective, their reliance on hand-crafted methods limits their applicability across diverse data types and tasks. Although modern learnable augmentation methods offer increased adaptability, they are computationally expensive and challenging to incorporate within prevalent augmentation workflows. In this work, we present a novel, efficient method for data augmentation, effectively bridging the gap between existing augmentation strategies and emerging datasets and learning tasks. We introduce SAFLEX (Self-Adaptive Augmentation via Feature Label EXtrapolation), which learns the sample weights and soft labels of augmented samples provided by any given upstream augmentation pipeline, using a specifically designed…
Peer Reviews
Decision·ICLR 2024 poster
The paper is well-written and easy to understand. The diagrams and the equations are easy to follow. The experiments are performed on diverse datasets with various tasks, including medical imaging and tabular data. The results are highly encouraging.
A few important previous works on sampling and purifying GAN synthetic data are relevant to this paper. It is important to acknowledge and discuss their contributions in the paper. Caramalau, Razvan, Binod Bhattarai, and Tae-Kyun Kim. "Sequential graph convolutional network for active learning." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021. Bhattarai, Binod, et al. "Sampling strategies for gan synthetic data." ICASSP 2020-2020 IEEE International Conf
1. The motivation in the paper about identifying the two issues with standard augmentation and then solving it by learning sample weights and soft-labels is really clear.
1. The main issue is a lack of proper baselines. Papers such as [1] have already explored using soft labels for augmentations where the softness is derived on the basis of augmentation strength. This paper's novelty thus gets limited. There is no comparison with [1] in any of the experiments. The authors should do a proper comparison with [1] and justify how their approach is better than it. 2. To solidify the experimental results the authors should also experiment with stronger architectures
They have considered experiments of different data types and model training as downstream tasks, which demonstrate their workflow as a robust one.
From a model perspective, this is a good one as topic of adaptive learning, though a little bit off the topic of this conference. From data augmentation perspective, it is better to demo some more experiments in downstream task involves with high dimensional data.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Text and Document Classification Technologies · Anomaly Detection Techniques and Applications
MethodsCutMix · Diffusion
