Optimizing Data Augmentation through Bayesian Model Selection
Madi Matymov (1), Ba-Hien Tran (2), Michael Kampffmeyer (3, 4), Markus Heinonen (5), Maurizio Filippone (1) ((1) KAUST, (2) Huawei Paris Research Center, (3) UiT The Arctic University of Norway, (4) Norwegian Computing Center, (5) Aalto University)

TL;DR
This paper introduces a Bayesian framework for optimizing data augmentation parameters, improving model robustness and calibration by jointly learning augmentation and model parameters through a variational approach.
Contribution
It presents a novel probabilistic method that treats augmentation parameters as hyperparameters and optimizes them via a tractable ELBO, grounded in Bayesian model selection.
Findings
Enhanced model calibration and robustness in vision and NLP tasks
Outperforms fixed or no augmentation strategies in experiments
Provides theoretical guarantees on approximation quality and generalization
Abstract
Data Augmentation (DA) has become an essential tool to improve robustness and generalization of modern machine learning. However, when deciding on DA strategies it is critical to choose parameters carefully, and this can be a daunting task which is traditionally left to trial-and-error or expensive optimization based on validation performance. In this paper, we counter these limitations by proposing a novel framework for optimizing DA. In particular, we take a probabilistic view of DA, which leads to the interpretation of augmentation parameters as model (hyper)-parameters, and the optimization of the marginal likelihood with respect to these parameters as a Bayesian model selection problem. Due to its intractability, we derive a tractable ELBO, which allows us to optimize augmentation parameters jointly with model parameters. We provide extensive theoretical results on variational…
Peer Reviews
Decision·ICLR 2026 Poster
- The theoretical results are extensive and address multiple aspects of this framework. The work appears to give practical insight on what relevant quantities and choice of distribution over data augmentation parameters affect various aspects of model performance within this framework of data augmentation via Bayesian optimization. The corollaries written throughout the text after theorems give insight and practical advice, and seem to cover many different lens through which to understand how th
- The noted mathematical imprecision comments from previous reviews. - Some experimental results can be confusing to some readers. In Section 5.1, Figure 2, the “test loss” figure is unclear and does not appear to clearly show what the text is saying. Also, perhaps a reference to Appendix F.2 or including it in the main body would be appropriate and explain why ResNet is getting less than 80% accuracy in Table 2.
The authors address the interesting, useful task of data augmentation as a technique that helps improve model performance across different downstream tasks. The authors provide solid theoretical foundations for their proposed method, OPTIMA, highlighting the types of invariances it promotes and the uncertainty quantification it enables. They also present experiments on standard datasets such as ImageNet, showing that OPTIMA slightly outperforms methods that do not use data augmentation
Methods like bilevel optimization [1] also aim to optimize data augmentation functions, and it would be nice to show how this paper differentiates OPTIMA from these related approaches in terms of how good they perform augmentation. The field of image classification is already quite saturated; it would be more compelling to see results on harder tasks such as segmentation or object detection. While data augmentation optimization is valuable, it doesn’t provide fundamentally new information, unl
-The paper proposes a Bayesian framework for the joint optimization of data augmentation parameters and model parameters. -The theoretical analysis is comprehensive, including approximation bounds, generalization guarantees, and invariance derivations.
-The paper primarily focuses on computer vision tasks (such as image classification) and does not extend to other modalities (such as natural language or time-series data), which limits its generalizability claims. -The compared algorithms are outdated. -The impact of the dimensionality of data augmentation parameters on the proposed method is not discussed. -Some theorems rely on mathematical assumptions that are difficult to apply in practice.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Domain Adaptation and Few-Shot Learning · Gaussian Processes and Bayesian Inference
