PRIV-QA: Privacy-Preserving Question Answering for Cloud Large Language Models
Guangwei Li, Yuansen Zhang, Yinggui Wang, Shoumeng Yan, Lei Wang, Tao, Wei

TL;DR
PRIV-QA introduces a privacy-preserving pipeline and a new dataset for secure question answering with large language models, balancing user privacy with interaction quality in cloud-based scenarios.
Contribution
The paper presents the first privacy open-ended QA dataset and a multi-stage privacy preservation method for cloud LLMs, enhancing privacy without sacrificing response quality.
Findings
Effective privacy protection while maintaining response quality
Construction of SensitiveQA dataset with 57k interactions
Validated approach through experimental results
Abstract
The rapid development of large language models (LLMs) is redefining the landscape of human-computer interaction, and their integration into various user-service applications is becoming increasingly prevalent. However, transmitting user data to cloud-based LLMs presents significant risks of data breaches and unauthorized access to personal identification information. In this paper, we propose a privacy preservation pipeline for protecting privacy and sensitive information during interactions between users and LLMs in practical LLM usage scenarios. We construct SensitiveQA, the first privacy open-ended question-answering dataset. It comprises 57k interactions in Chinese and English, encompassing a diverse range of user-sensitive information within the conversations. Our proposed solution employs a multi-stage strategy aimed at preemptively securing user information while simultaneously…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Topic Modeling · Privacy-Preserving Technologies in Data
