Agnostic Sharpness-Aware Minimization
Van-Anh Nguyen, Quyen Tran, Tuan Truong, Thanh-Toan Do, Dinh Phung,, Trung Le

TL;DR
Agnostic-SAM combines sharpness-aware minimization and meta-learning principles to find flatter, more robust minima, significantly enhancing model generalization especially in noisy or limited data scenarios.
Contribution
This paper introduces Agnostic-SAM, a novel method that integrates SAM and MAML to improve model robustness and generalization across diverse tasks and data conditions.
Findings
Agnostic-SAM outperforms baseline methods on multiple datasets.
It achieves better robustness under noisy labels.
It improves generalization with limited data.
Abstract
Sharpness-aware minimization (SAM) has been instrumental in improving deep neural network training by minimizing both the training loss and the sharpness of the loss landscape, leading the model into flatter minima that are associated with better generalization properties. In another aspect, Model-Agnostic Meta-Learning (MAML) is a framework designed to improve the adaptability of models. MAML optimizes a set of meta-models that are specifically tailored for quick adaptation to multiple tasks with minimal fine-tuning steps and can generalize well with limited data. In this work, we explore the connection between SAM and MAML in enhancing model generalization. We introduce Agnostic-SAM, a novel approach that combines the principles of both SAM and MAML. Agnostic-SAM adapts the core idea of SAM by optimizing the model toward wider local minima using training data, while concurrently…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
- The motivation to combine the element of sharpness-minimization in meta-learning for better generalization makes sense. This is the operationalized well in the form of an algorithm that is shown to perform slightly better with the baselines. - The method seems to be extensively tested in supervised learning setups, meta-learning scenarios, as well as those with label noise.
- Difference wrt Abbas et al. 2022: When compared with this prior work, it is unclear what is the novelty here. The authors mention this paper, but don't bother to explain the similarities or differences. The method here looks **eerily similar** to the two-year old prior work, which is arguably better written and presented and a lot more richer. Except for little bits of analysis on congruence between gradients, I can't spot much of a methodological difference. - Supervised learning experiment
- SAM and MAML are both found to be effective for enhancing generalization performance, and that the paper is attempting to explore the intersection of these is encouraging. - The paper follows a standard procedure to evaluate the proposed method (Agnostic-SAM) and shows its effectiveness in experiments.
There are several concerns on this paper summarized as follows. Method - The main idea and motivation of this work, as its current form, remain quite random. They are two of many potential ways to improve generalization performance, but without clearly justifying why these two, the paper simply combine the two approaches and end up providing experimental results. This diminishes the technical contributions and novelty. - The authors also claim that it is a "framework", but with it being the sim
The paper provides a comprehensive evaluation of Agnostic-SAM across a wide range of tasks, including image classification, transfer learning, training with label noise, and meta-learning.
1. The motivation for the problem formulation in Equation 3 is not convincingly justified. It would benefit from a clearer explanation of why this specific formulation was chosen and how it directly leads to generalization. 2. The paper does not sufficiently clarify how the integration of MAML’s insights with the proposed problem formulation and algorithm specifically aids generalization. A deeper theoretical or empirical justification is needed. 3. The proposed algorithm assumes the existence o
- The proposed method requires an additional hyperparameter, but the authors found a way of setting it consistently throughout their experiments: $ \rho_{1} = 2 \rho_{2} $. - Agnostic-SAM improves over baselines in most cases (even though I have doubts about the setups, see below) - Combining ideas from MAML and SAM is a creative approach
**Comparison to Baselines** At its core, Agnostic-SAM changes the perturbation step of SAM by adding an additional perturbation based on gradients from a separate, smaller data batch, and the authors claim improved generalization performance. However, several methods have proposed adjustments to SAM’s perturbation model with improved generalization performance. Most similar to Agnostic-SAM, [1] adds random perturbations to the gradient-based perturbation, while [2] and [3] perform multi-step pe
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Vision Systems and Defect Detection
MethodsSparse Evolutionary Training · Model-Agnostic Meta-Learning · Segment Anything Model
