Tracing Privacy Leakage of Language Models to Training Data via Adjusted Influence Functions
Jinxin Liu, Zao Yang

TL;DR
This paper introduces Heuristically Adjusted Influence Functions (HAIF) to better trace privacy leakage in language models, significantly improving accuracy over existing methods by reducing influence overestimation of high-gradient tokens.
Contribution
The paper proposes HAIF, a novel adjustment to influence functions that enhances privacy leakage tracing accuracy in language models, addressing limitations of current influence estimation methods.
Findings
HAIF improves tracing accuracy by up to 73.71% on PII-E dataset.
HAIF outperforms state-of-the-art influence functions on real-world data.
The method demonstrates robustness across different prompt and response lengths.
Abstract
The responses generated by Large Language Models (LLMs) can include sensitive information from individuals and organizations, leading to potential privacy leakage. This work implements Influence Functions (IFs) to trace privacy leakage back to the training data, thereby mitigating privacy concerns of Language Models (LMs). However, we notice that current IFs struggle to accurately estimate the influence of tokens with large gradient norms, potentially overestimating their influence. When tracing the most influential samples, this leads to frequently tracing back to samples with large gradient norm tokens, overshadowing the actual most influential samples even if their influences are well estimated. To address this issue, we propose Heuristically Adjusted IF (HAIF), which reduces the weight of tokens with large gradient norms, thereby significantly improving the accuracy of tracing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Residual Connection · Multi-Head Attention · Cosine Annealing · Adam · Layer Normalization · Weight Decay · Dense Connections
