Revisit, Extend, and Enhance Hessian-Free Influence Functions

Ziao Yang; Han Yue; Jian Chen; Hongfu Liu

arXiv:2405.17490·cs.LG·March 27, 2026

Revisit, Extend, and Enhance Hessian-Free Influence Functions

Ziao Yang, Han Yue, Jian Chen, Hongfu Liu

PDF

Open Access 3 Reviews

TL;DR

This paper revisits the TracIn influence estimation method, providing insights into its effectiveness, extending its applications to fairness and robustness, and enhancing it with ensemble strategies for better performance in various deep learning tasks.

Contribution

It offers a deeper understanding of the simple Hessian approximation in TracIn, extends its use to fairness and robustness, and introduces an ensemble enhancement for improved influence estimation.

Findings

01

TracIn performs well despite its naive Hessian approximation.

02

The extended TracIn improves fairness and robustness in models.

03

Ensemble strategies enhance influence estimation accuracy.

Abstract

Influence functions serve as crucial tools for assessing sample influence in model interpretation, subset training set selection, noisy label detection, and more. By employing the first-order Taylor extension, influence functions can estimate sample influence without the need for expensive model retraining. However, applying influence functions directly to deep models presents challenges, primarily due to the non-convex nature of the loss function and the large size of model parameters. This difficulty not only makes computing the inverse of the Hessian matrix costly but also renders it non-existent in some cases. Various approaches, including matrix decomposition, have been explored to expedite and approximate the inversion of the Hessian matrix, with the aim of making influence functions applicable to deep models. In this paper, we revisit a specific, albeit naive, yet effective…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 3Confidence 5

Strengths

- The paper is well written, and the reader can understand the main idea of the paper quickly in a short time. - Efficiency has become a very important topic for TDA(training data attribution).

Weaknesses

- There is a large gap between the contribution claimed in this paper and actual literature. The major problem lies in the first and second contribution bullet point in section 1 (line 59 - line 62) - The Inner Product (IP) proposed by this paper (as a simplified version of TracIN) has long been proposed [1] and used in a large number of papers[2]. - Replacement of the loss gradient to some other metrics to fairness and robustness is also something tried for influence function or related me

Reviewer 02Rating 3Confidence 4

Strengths

The paper includes a lot of experiments and provides statistical ranges for the reported results.

Weaknesses

However, the paper lacks a clear problem statement and positioning of the method within existing approaches. The formula provided to describe the method is the TracIn formula, with the only difference being the reduction of calculations to the final trained model. This raises several questions: (i) the explanation of the benefits of this simplification is vague and could be questioned, (ii) the intuition behind this approximation rests on a comparison with a method involving the Hessian’s invers

Reviewer 03Rating 3Confidence 3

Strengths

1. The proposed method is particularly simple, easy to implement and efficient to compute. 2. The experimental evaluations presented in the paper are fairly thorough and rigorous, with appropriate repeat experiments to establish confidence intervals. 3. The extension of influence estimation to algorithmic fairness metrics is interesting.

Weaknesses

1. The order consistency argument for why IP is a good approximation to inverse hessian influence is quite weak. In figure 1, data points with vectors in regions I and III would not satisfy order consistency. The authors argue that since IP and IF both rate such points as beneficial/detrimental, the order doesn’t matter. This is clearly not true, as many applications of influence estimation involve determining set membership at the extreme ends of the influence spectrum (e.g. removing x% detrime

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques

MethodsSparse Evolutionary Training