Attributing Data for Sharpness-Aware Minimization

Chenyang Ren; Yifan Jia; Huanyi Xie; Zhaobin Xu; Tianxing Wei; Liangyu Wang; Lijie Hu; Di Wang

arXiv:2507.04059·cs.LG·July 8, 2025

Attributing Data for Sharpness-Aware Minimization

Chenyang Ren, Yifan Jia, Huanyi Xie, Zhaobin Xu, Tianxing Wei, Liangyu Wang, Lijie Hu, Di Wang

PDF

5 Reviews

TL;DR

This paper introduces two novel data attribution methods based on influence functions tailored for Sharpness-Aware Minimization (SAM), enabling effective data influence evaluation despite SAM's complex bilevel optimization structure.

Contribution

The paper develops two influence function-based data valuation methods specifically designed for SAM, addressing the challenges posed by its bilevel optimization and inner loop structure.

Findings

01

Effective in identifying mislabeled data

02

Improves model interpretability

03

Enhances data-driven model editing

Abstract

Sharpness-aware Minimization (SAM) improves generalization in large-scale model training by linking loss landscape geometry to generalization. However, challenges such as mislabeled noisy data and privacy concerns have emerged as significant issues. Data attribution, which identifies the contributions of specific training samples, offers a promising solution. However, directly rendering existing data influence evaluation tools such as influence functions (IF) to SAM will be inapplicable or inaccurate as SAM utilizes an inner loop to find model perturbations that maximize loss, which the outer loop then minimizes, resulting in a doubled computational structure. Additionally, this bilevel structure complicates the modeling of data influence on the parameters. In this paper, based on the IF, we develop two innovative data valuation methods for SAM, each offering unique benefits in…

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 2Confidence 4

Strengths

- The paper is well-structured and clearly written. - The paper is theoretically grounded.

Weaknesses

- The experiment is limited to the image classification task. It would be more convincing if the authors supplemented the experiments with other tasks as well. - Thorough comparison against baselines is required both for the experiments and computational complexity.

Reviewer 02Rating 4Confidence 3

Strengths

I do not have enough background in optimization to know how common SAM optimization is used vs the standard SGD / Adam. Assuming that SAM is prevalent in practice, I think the results are interesting: they explore a new and unique challenge and the proposed methods seem to work well in practice.

Weaknesses

Despite this, I think that the paper is lacking in its presentation -- to the point where I could not verify the main results and would not recommend its publication in ICLR in its current form. The biggest problem by far is that the statements and assumptions of the theory section seem vague to me (especially if they are stated as "Theorems"). Moreover, the rest of the paper seems very rough, with both structural issues (lack of intuition / motivation for the theory, the main empirical results

Reviewer 03Rating 4Confidence 3

Strengths

1. Given the widespread use of SAM and IF, the efficient calculation of the influence function for SAM, the research question of this study, is both interesting and highly influential. 2. SAM-HIF and SAM-GIF are grounded in diverse theoretical foundations, and its derivation process is also straightforward to understand.

Weaknesses

1. The mathematical expressions throughout the paper, including the proofs, lack completeness. $w_{k,\delta}$ and $w_{\delta}$, $e_{k,\delta}(w)$ and $e_{\delta}(w)$ are used interchangeably in the paper and proofs. The definition of $L_{S}^{SAM}$ includes a regularization term. Therefore, Eq.(3), which expresses the gradient of $L_{S}^{SAM}$, should include $\lambda w$. In Theorem 4.3, the Identity matrix $I$ is omitted from the definition of $H_w$. I believe the paper requires detailed review

Reviewer 04Rating 4Confidence 3

Strengths

- The motivation of this paper is clear. SAM as an important optimization method and the influence function to it is important to expand the applicability of data attribution - The presentation (preliminary of SAM, IF; derivation of SAM-IF, SAM-HIF, and SAM-GIF) is clear enough and fluent for the reader. - The results shows that the proposed SAM specific influence function is effective on several settings.

Weaknesses

- The experiment design is somehow problematic. - The evaluation metric of the experiment seems lack clearance. For example, the "accuracy" shown in Figure 1 and Table 1 is somehow hard to interpret the effectiveness of SAM specific IFs. And the metric in Figure 2 (test accuracy after the removal of harmful training data (with flipped label)) could show the effectiveness while indirectly. - A suggestion to show this is to use a small model (could be linear) to show that the groundtruth (retr

Reviewer 05Rating 4Confidence 4

Strengths

1. Exploring the effectiveness of Influence Functions (IF) within the SAM framework is a relatively novel direction.

Weaknesses

1. The contribution of this paper appears limited. The main contribution lies in adapting the Influence Function (IF) framework to the SAM setting, while the derivation closely follows the methodology used in the original IF paper. Moreover, the idea of leveraging gradient trajectories has already been explored in TracIn [1]. 2. It is unclear why the authors compare their proposed method with Retrain in all experiments related to efficiency and accuracy. Since the main motivation of this paper

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.