Membership Inference Attacks for Retrieval Based In-Context Learning for Document Question Answering
Tejas Kulkarni, Antti Koskela, Laith Zumot

TL;DR
This paper demonstrates that retrieval-augmented in-context learning for document question answering is vulnerable to black-box membership inference attacks, and proposes methods to improve attack effectiveness and defenses.
Contribution
It introduces two novel black-box membership inference attacks exploiting query prefixes and evaluates their effectiveness against existing defenses.
Findings
Attacks outperform prior methods in many cases with few prefixes.
The second attack eliminates the need for a reference model, simplifying the process.
An ensemble prompting defense substantially mitigates the second attack.
Abstract
We show that remotely hosted applications employing in-context learning when augmented with a retrieval function to select in-context examples can be vulnerable to membership-inference attacks even when the service provider and users are separate parties. We propose two black-box membership inference attacks that exploit query text prefixes to distinguish member from non-member inputs. The first attack uses a reference model to estimate an otherwise unavailable loss metric. The second attack improves upon it by eliminating the reference model and instead computing a membership statistic through a simple but novel weighted-averaging scheme. Our comprehensive empirical evaluations consider a stricter case in which the adversary has a paraphrased version of the text in the queries and show that our attacks can exhibit stronger resilience to paraphrasing and outperform three prior attacks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
