Fingerprinting Fine-tuned Language Models in the Wild

Nirav Diwan; Tanmoy Chakravorty; Zubair Shafiq

arXiv:2106.01703·cs.CL·June 4, 2021

Fingerprinting Fine-tuned Language Models in the Wild

Nirav Diwan, Tanmoy Chakravorty, Zubair Shafiq

PDF

Open Access 1 Repo

TL;DR

This paper investigates the challenge of fingerprinting fine-tuned language models in real-world scenarios, demonstrating that fine-tuning significantly aids in attributing synthetic text to its source model.

Contribution

It introduces a large-scale study on fingerprinting fine-tuned LMs, highlighting the limitations of existing methods and showing fine-tuning as a key factor for attribution.

Findings

01

Fine-tuning improves attribution accuracy.

02

Existing fingerprinting methods have limitations on large-scale models.

03

Fine-tuning is the most effective feature for attribution.

Abstract

There are concerns that the ability of language models (LMs) to generate high quality synthetic text can be misused to launch spam, disinformation, or propaganda. Therefore, the research community is actively working on developing approaches to detect whether a given text is organic or synthetic. While this is a useful first step, it is important to be able to further fingerprint the author LM to attribute its origin. Prior work on fingerprinting LMs is limited to attributing synthetic text generated by a handful (usually < 10) of pre-trained LMs. However, LMs such as GPT2 are commonly fine-tuned in a myriad of ways (e.g., on a domain-specific text corpus) before being used to generate synthetic text. It is challenging to fingerprinting fine-tuned LMs because the universe of fine-tuned LMs is much larger in realistic scenarios. To address this challenge, we study the problem of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

LCS2-IIITD/ACL-FFLM
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Hate Speech and Cyberbullying Detection