Ensemble Privacy Defense for Knowledge-Intensive LLMs against Membership Inference Attacks
Haowei Fu, Bo Ni, Han Xu, Kunpeng Liu, Dan Lin, Tyler Derr

TL;DR
This paper evaluates the vulnerability of knowledge-injected LLMs to membership inference attacks and introduces a novel ensemble-based privacy defense framework that significantly reduces attack success while preserving answer quality.
Contribution
It systematically assesses MIA vulnerabilities in RAG and SFT LLMs and proposes a model-agnostic ensemble privacy defense framework to enhance privacy protection.
Findings
EPD reduces MIA success by up to 27.8% for SFT
EPD reduces MIA success by up to 526.3% for RAG
EPD maintains answer quality despite privacy enhancements
Abstract
Retrieval-Augmented Generation (RAG) and Supervised Finetuning (SFT) have become the predominant paradigms for equipping Large Language Models (LLMs) with external knowledge for diverse, knowledge-intensive tasks. However, while such knowledge injection improves performance, it also exposes new attack surfaces. Membership Inference Attacks (MIAs), which aim to determine whether a given data sample was included in a model's training set, pose serious threats to privacy and trust in sensitive domains. To this end, we first systematically evaluate the vulnerability of RAG- and SFT-based LLMs to various MIAs. Then, to address the privacy risk, we further introduce a novel, model-agnostic defense framework, Ensemble Privacy Defense (EPD), which aggregates and evaluates the outputs of a knowledge-injected LLM, a base LLM, and a dedicated judge model to enhance resistance against MIAs.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Adversarial Robustness in Machine Learning · Privacy-Preserving Technologies in Data
