You Don't Know My Favorite Color: Preventing Dialogue Representations from Revealing Speakers' Private Personas
Haoran Li, Yangqiu Song, Lixin Fan

TL;DR
This paper investigates privacy risks in social chatbots, revealing that speakers' private personas can be inferred from hidden states and proposing defenses that significantly reduce this leakage while maintaining language generation quality.
Contribution
It introduces novel defense objectives to prevent persona leakage from chatbot hidden states, a previously underexplored privacy concern.
Findings
Persona inference accuracy reduced from 37.6% to 0.5%.
Proposed defenses preserve language generation capabilities.
Extensive experiments validate effectiveness of the methods.
Abstract
Social chatbots, also known as chit-chat chatbots, evolve rapidly with large pretrained language models. Despite the huge progress, privacy concerns have arisen recently: training data of large language models can be extracted via model inversion attacks. On the other hand, the datasets used for training chatbots contain many private conversations between two individuals. In this work, we further investigate the privacy leakage of the hidden states of chatbots trained by language modeling which has not been well studied yet. We show that speakers' personas can be inferred through a simple neural network with high accuracy. To this end, we propose effective defense objectives to protect persona leakage from hidden states. We conduct extensive experiments to demonstrate that our proposed defense objectives can greatly reduce the attack accuracy from 37.6% to 0.5%. Meanwhile, the proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Mental Health Interventions · Privacy, Security, and Data Protection · Persona Design and Applications
