Privacy Preserving Large Language Models: ChatGPT Case Study Based Vision and Framework
Imdad Ullah, Najm Hassan, Sukhpal Singh Gill, Basem Suleiman, Tariq, Ahamed Ahanger, Zawar Shah, Junaid Qadir, and Salil S. Kanhere

TL;DR
This paper introduces PrivChatGPT, a conceptual framework for privacy-preserving large language models, integrating differential privacy and private training methods to protect user data during model development.
Contribution
It proposes a novel privacy-preserving model for LLMs, combining data curation, private training, and privacy measurement, serving as a benchmark for future privacy-focused AI models.
Findings
Differential privacy impacts model utility and accuracy.
Blockchain and PIR increase computational complexity.
The model provides measurable privacy guarantees during training.
Abstract
The generative Artificial Intelligence (AI) tools based on Large Language Models (LLMs) use billions of parameters to extensively analyse large datasets and extract critical private information such as, context, specific details, identifying information etc. This have raised serious threats to user privacy and reluctance to use such tools. This article proposes the conceptual model called PrivChatGPT, a privacy-preserving model for LLMs that consists of two main components i.e., preserving user privacy during the data curation/pre-processing together with preserving private context and the private training process for large-scale data. To demonstrate its applicability, we show how a private mechanism could be integrated into the existing model for training LLMs to protect user privacy; specifically, we employed differential privacy and private training using Reinforcement Learning (RL).…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data
