RoFL: Robust Fingerprinting of Language Models
Yun-Yun Tsai, Chuan Guo, Junfeng Yang, Laurens van der Maaten

TL;DR
This paper introduces RoFL, a robust, non-invasive fingerprinting method for identifying large language models in black-box settings, enabling license compliance verification without impacting model quality.
Contribution
RoFL provides a novel, robust fingerprinting technique for LLMs that works in black-box scenarios without requiring model modifications or training.
Findings
High robustness to model alterations and inference changes
Outperforms prior watermarking methods in accuracy
Effective in API-based model identification
Abstract
AI developers are releasing large language models (LLMs) under a variety of different licenses. Many of these licenses restrict the ways in which the models or their outputs may be used. This raises the question how license violations may be recognized. In particular, how can we identify that an API or product uses (an adapted version of) a particular LLM? We present a new method that enable model developers to perform such identification via fingerprints: statistical patterns that are unique to the developer's model and robust to common alterations of that model. Our method permits model identification in a black-box setting using a limited number of queries, enabling identification of models that can only be accessed via an API or product. The fingerprints are non-invasive: our method does not require any changes to the model during training, hence by design, it does not impact model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI
