Fingerprinting Fine-tuned Language Models in the Wild
Nirav Diwan, Tanmoy Chakravorty, Zubair Shafiq

TL;DR
This paper investigates the challenge of fingerprinting fine-tuned language models in real-world scenarios, demonstrating that fine-tuning significantly aids in attributing synthetic text to its source model.
Contribution
It introduces a large-scale study on fingerprinting fine-tuned LMs, highlighting the limitations of existing methods and showing fine-tuning as a key factor for attribution.
Findings
Fine-tuning improves attribution accuracy.
Existing fingerprinting methods have limitations on large-scale models.
Fine-tuning is the most effective feature for attribution.
Abstract
There are concerns that the ability of language models (LMs) to generate high quality synthetic text can be misused to launch spam, disinformation, or propaganda. Therefore, the research community is actively working on developing approaches to detect whether a given text is organic or synthetic. While this is a useful first step, it is important to be able to further fingerprint the author LM to attribute its origin. Prior work on fingerprinting LMs is limited to attributing synthetic text generated by a handful (usually < 10) of pre-trained LMs. However, LMs such as GPT2 are commonly fine-tuned in a myriad of ways (e.g., on a domain-specific text corpus) before being used to generate synthetic text. It is challenging to fingerprinting fine-tuned LMs because the universe of fine-tuned LMs is much larger in realistic scenarios. To address this challenge, we study the problem of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Hate Speech and Cyberbullying Detection
