The 20 questions game to distinguish large language models
Gurvan Richardeau, Erwan Le Merrer, Camilla Penzo, Gilles Tredan

TL;DR
This paper introduces a method inspired by the 20 questions game to identify whether two large language models are the same, using minimal binary questions, with high accuracy and practical heuristics for model discrimination.
Contribution
The paper formalizes the problem of model identification via binary questions and proposes two heuristics that outperform baseline random questioning in efficiency and stealth.
Findings
Baseline random questions achieve nearly 100% accuracy within 20 questions.
Proposed heuristics discriminate 22 LLMs using half as many questions.
Methods are useful for model auditing and detecting model leaks.
Abstract
In a parallel with the 20 questions game, we present a method to determine whether two large language models (LLMs), placed in a black-box context, are the same or not. The goal is to use a small set of (benign) binary questions, typically under 20. We formalize the problem and first establish a baseline using a random selection of questions from known benchmark datasets, achieving an accuracy of nearly 100% within 20 questions. After showing optimal bounds for this problem, we introduce two effective questioning heuristics able to discriminate 22 LLMs by using half as many questions for the same task. These methods offer significant advantages in terms of stealth and are thus of interest to auditors or copyright owners facing suspicions of model leaks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsSparse Evolutionary Training
