Influence-based Attributions can be Manipulated

Chhavi Yadav; Ruihan Wu; Kamalika Chaudhuri

arXiv:2409.05208·cs.LG·October 8, 2024

Influence-based Attributions can be Manipulated

Chhavi Yadav, Ruihan Wu, Kamalika Chaudhuri

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper demonstrates that influence-based attribution methods, used for understanding model predictions, can be systematically manipulated by adversaries, especially in logistic regression models on certain datasets, raising concerns about their reliability.

Contribution

The authors show that influence-based attributions are vulnerable to manipulation and provide efficient attack methods demonstrating this vulnerability.

Findings

01

Influence functions can be tampered with in logistic regression models.

02

Adversaries can systematically manipulate attributions on standard datasets.

03

The work questions the robustness of influence-based explanations in adversarial settings.

Abstract

Influence Functions are a standard tool for attributing predictions to training data in a principled manner and are widely used in applications such as data valuation and fairness. In this work, we present realistic incentives to manipulate influence-based attributions and investigate whether these attributions can be \textit{systematically} tampered by an adversary. We show that this is indeed possible for logistic regression models trained on ResNet feature embeddings and standard tabular fairness datasets and provide efficient attacks with backward-friendly implementations. Our work raises questions on the reliability of influence-based attributions in adversarial circumstances. Code is available at : \url{https://github.com/infinite-pursuits/influence-based-attributions-can-be-manipulated}

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 3Confidence 4

Strengths

1. The paper is easy to follow, and the results and methodology are clearly presented in tables and figures/diagrams. 2. The overall idea of studying vulnerabilities in influence functions is interesting, especially as there are some applications built on using influence scores. However, I have some concerns, which are discussed in the following points.

Weaknesses

1. The first and most important point to raise is that the attack setup does not seem practical in the real world. The paper provides a framework that includes two components: 1) Data Provider and 2) Influence Calculator. It is assumed that the data provider is truthful, while the Influence Calculator, which trains a "model" and calculates the influence scores, is maliciously manipulated. I can't think of a real-world scenario in which an Influence Calculator would deliberately aim to manipulate

Reviewer 02Rating 3Confidence 3

Strengths

- The paper is well-written and well-organized. - The paper provides a practical and efficient algorithm to compute the backward pass through Hessian-Inverse-Vector Products for influence-based objectives. - The experimental results in the sources show that influence-based attacks, both targeted and untargeted, can successfully manipulate influence scores while maintaining model accuracy.

Weaknesses

- The setting of this working is confused to me. It presents a scenario where an adversary can manipulate the model training process to achieve desired influence scores. Is it a realistic scenario in practice and why would an adversary manipulate the influence score when he could manipulate the model itself? - The method aims to manipulate the influence scores while maintaining the similar test accuracy, and this accuracy is measured by a `dist` function between the parameters and limited by a r

Reviewer 03Rating 1Confidence 4

Strengths

- The paper conducts experiments on multiple datasets. - The concept of adversarial attacks on influence scores is an interesting idea overall.

Weaknesses

In my opinion, the paper possesses major limitations that invalidate its contributions. My major concern revolves around the proposed threat model itself, which seems untenable for real-world practical applications. Other issues are concerned with limited model and dataset evaluation, and the overall simplicity of the work: - **Unjustifiable Threat Model**: The threat model makes multiple assumptions about the influence (and data valuation) problem pipelines in order to make the attack viable.

Code & Models

Repositories

infinite-pursuits/influence-based-attributions-can-be-manipulated
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOpinion Dynamics and Social Influence · Multi-Agent Systems and Negotiation

MethodsAverage Pooling · Max Pooling · Kaiming Initialization · Logistic Regression · Convolution · Global Average Pooling