Privacy Preserving Large Language Models: ChatGPT Case Study Based   Vision and Framework

Imdad Ullah; Najm Hassan; Sukhpal Singh Gill; Basem Suleiman; Tariq; Ahamed Ahanger; Zawar Shah; Junaid Qadir; and Salil S. Kanhere

arXiv:2310.12523·cs.CR·October 20, 2023·1 cites

Privacy Preserving Large Language Models: ChatGPT Case Study Based Vision and Framework

Imdad Ullah, Najm Hassan, Sukhpal Singh Gill, Basem Suleiman, Tariq, Ahamed Ahanger, Zawar Shah, Junaid Qadir, and Salil S. Kanhere

PDF

Open Access

TL;DR

This paper introduces PrivChatGPT, a conceptual framework for privacy-preserving large language models, integrating differential privacy and private training methods to protect user data during model development.

Contribution

It proposes a novel privacy-preserving model for LLMs, combining data curation, private training, and privacy measurement, serving as a benchmark for future privacy-focused AI models.

Findings

01

Differential privacy impacts model utility and accuracy.

02

Blockchain and PIR increase computational complexity.

03

The model provides measurable privacy guarantees during training.

Abstract

The generative Artificial Intelligence (AI) tools based on Large Language Models (LLMs) use billions of parameters to extensively analyse large datasets and extract critical private information such as, context, specific details, identifying information etc. This have raised serious threats to user privacy and reluctance to use such tools. This article proposes the conceptual model called PrivChatGPT, a privacy-preserving model for LLMs that consists of two main components i.e., preserving user privacy during the data curation/pre-processing together with preserving private context and the private training process for large-scale data. To demonstrate its applicability, we show how a private mechanism could be integrated into the existing model for training LLMs to protect user privacy; specifically, we employed differential privacy and private training using Reinforcement Learning (RL).…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data