Scalable Fingerprinting of Large Language Models

Anshul Nasery; Jonathan Hayase; Creston Brooks; Peiyao Sheng; Himanshu Tyagi; Pramod Viswanath; Sewoong Oh

arXiv:2502.07760·cs.CR·October 1, 2025

Scalable Fingerprinting of Large Language Models

Anshul Nasery, Jonathan Hayase, Creston Brooks, Peiyao Sheng, Himanshu Tyagi, Pramod Viswanath, Sewoong Oh

PDF

Open Access 3 Models 1 Video

TL;DR

This paper introduces a scalable fingerprinting method for large language models, enabling the embedding of thousands of unique identifiers without affecting model utility, and demonstrating robustness against fine-tuning and security threats.

Contribution

We propose Perinucleus sampling, a novel scalable fingerprinting technique that embeds thousands of persistent, harmless fingerprints into large language models.

Findings

01

Can embed 24,576 fingerprints into a Llama-3.1-8B model

02

Fingerprints remain effective after fine-tuning

03

Scheme mitigates security risks associated with fingerprinting

Abstract

Model fingerprinting has emerged as a powerful tool for model owners to identify their shared model given API access. However, to lower false discovery rate, fight fingerprint leakage, and defend against coalitions of model users attempting to bypass detection, we argue that {\em scalability} is critical, i.e., scaling up the number of fingerprints one can embed into a model. Hence, we pose scalability as a crucial requirement for fingerprinting schemes. We experiment with fingerprint design at a scale significantly larger than previously considered, and introduce a new method, dubbed Perinucleus sampling, to generate scalable, persistent, and harmless fingerprints. We demonstrate that this scheme can add 24,576 fingerprints to a Llama-3.1-8B model -- two orders of magnitude more than existing schemes -- without degrading the model's utility. Our inserted fingerprints persist even after…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

Scalable Fingerprinting of Large Language Models· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Algorithms and Data Compression