Privacy Bias in Language Models: A Contextual Integrity-based Auditing Metric
Yan Shvartzshnaider, Vasisht Duddu

TL;DR
This paper introduces a new auditing metric based on contextual integrity to evaluate privacy biases in large language models, helping stakeholders assess ethical and societal impacts.
Contribution
It presents a novel methodology for reliably measuring privacy biases in LLM responses, considering prompt sensitivity and model factors.
Findings
Privacy bias varies with model capacity and optimization.
The proposed metric effectively detects privacy violations.
Sensitivity to prompt variations influences privacy bias assessments.
Abstract
As large language models (LLMs) are integrated into sociotechnical systems, it is crucial to examine the privacy biases they exhibit. We define privacy bias as the appropriateness value of information flows in responses from LLMs. A deviation between privacy biases and expected values, referred to as privacy bias delta, may indicate privacy violations. As an auditing metric, privacy bias can help (a) model trainers evaluate the ethical and societal impact of LLMs, (b) service providers select context-appropriate LLMs, and (c) policymakers assess the appropriateness of privacy biases in deployed LLMs. We formulate and answer a novel research question: how can we reliably examine privacy biases in LLMs and the factors that influence them? We present a novel approach for assessing privacy biases using a contextual integrity-based methodology to evaluate the responses from various LLMs. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsALIGN
