PRIV-QA: Privacy-Preserving Question Answering for Cloud Large Language   Models

Guangwei Li; Yuansen Zhang; Yinggui Wang; Shoumeng Yan; Lei Wang; Tao; Wei

arXiv:2502.13564·cs.CL·February 20, 2025

PRIV-QA: Privacy-Preserving Question Answering for Cloud Large Language Models

Guangwei Li, Yuansen Zhang, Yinggui Wang, Shoumeng Yan, Lei Wang, Tao, Wei

PDF

Open Access

TL;DR

PRIV-QA introduces a privacy-preserving pipeline and a new dataset for secure question answering with large language models, balancing user privacy with interaction quality in cloud-based scenarios.

Contribution

The paper presents the first privacy open-ended QA dataset and a multi-stage privacy preservation method for cloud LLMs, enhancing privacy without sacrificing response quality.

Findings

01

Effective privacy protection while maintaining response quality

02

Construction of SensitiveQA dataset with 57k interactions

03

Validated approach through experimental results

Abstract

The rapid development of large language models (LLMs) is redefining the landscape of human-computer interaction, and their integration into various user-service applications is becoming increasingly prevalent. However, transmitting user data to cloud-based LLMs presents significant risks of data breaches and unauthorized access to personal identification information. In this paper, we propose a privacy preservation pipeline for protecting privacy and sensitive information during interactions between users and LLMs in practical LLM usage scenarios. We construct SensitiveQA, the first privacy open-ended question-answering dataset. It comprises 57k interactions in Chinese and English, encompassing a diverse range of user-sensitive information within the conversations. Our proposed solution employs a multi-stage strategy aimed at preemptively securing user information while simultaneously…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Topic Modeling · Privacy-Preserving Technologies in Data