Tracing Privacy Leakage of Language Models to Training Data via Adjusted   Influence Functions

Jinxin Liu; Zao Yang

arXiv:2408.10468·cs.LG·September 6, 2024

Tracing Privacy Leakage of Language Models to Training Data via Adjusted Influence Functions

Jinxin Liu, Zao Yang

PDF

Open Access

TL;DR

This paper introduces Heuristically Adjusted Influence Functions (HAIF) to better trace privacy leakage in language models, significantly improving accuracy over existing methods by reducing influence overestimation of high-gradient tokens.

Contribution

The paper proposes HAIF, a novel adjustment to influence functions that enhances privacy leakage tracing accuracy in language models, addressing limitations of current influence estimation methods.

Findings

01

HAIF improves tracing accuracy by up to 73.71% on PII-E dataset.

02

HAIF outperforms state-of-the-art influence functions on real-world data.

03

The method demonstrates robustness across different prompt and response lengths.

Abstract

The responses generated by Large Language Models (LLMs) can include sensitive information from individuals and organizations, leading to potential privacy leakage. This work implements Influence Functions (IFs) to trace privacy leakage back to the training data, thereby mitigating privacy concerns of Language Models (LMs). However, we notice that current IFs struggle to accurately estimate the influence of tokens with large gradient norms, potentially overestimating their influence. When tracing the most influential samples, this leads to frequently tracing back to samples with large gradient norm tokens, overshadowing the actual most influential samples even if their influences are well estimated. To address this issue, we propose Heuristically Adjusted IF (HAIF), which reduces the weight of tokens with large gradient norms, thereby significantly improving the accuracy of tracing the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Residual Connection · Multi-Head Attention · Cosine Annealing · Adam · Layer Normalization · Weight Decay · Dense Connections