On Privacy and Confidentiality of Communications in Organizational Graphs
Masoumeh Shafieinejad, Huseyin Inan, Marcello Hasegawa and, Robert Sim

TL;DR
This paper explores how to preserve confidentiality in organizational communication data used for machine learning, proposing a model that accounts for social network correlations to improve privacy guarantees without overly sacrificing utility.
Contribution
It introduces a novel privacy framework based on Pufferfish principles that considers social network correlations, bridging the gap between record-level and group privacy in NLP tasks.
Findings
Naive differential privacy approaches overestimate privacy guarantees.
Correlation-aware privacy models improve utility while maintaining confidentiality.
A middle-ground solution balances privacy and model performance.
Abstract
Machine learned models trained on organizational communication data, such as emails in an enterprise, carry unique risks of breaching confidentiality, even if the model is intended only for internal use. This work shows how confidentiality is distinct from privacy in an enterprise context, and aims to formulate an approach to preserving confidentiality while leveraging principles from differential privacy. The goal is to perform machine learning tasks, such as learning a language model or performing topic analysis, using interpersonal communications in the organization, while not learning about confidential information shared in the organization. Works that apply differential privacy techniques to natural language processing tasks usually assume independently distributed data, and overlook potential correlation among the records. Ignoring this correlation results in a fictional promise…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Privacy, Security, and Data Protection · Ethics and Social Impacts of AI
