Artificial Interrogation for Attributing Language Models
Farhan Dhanani, Muhammad Rafi

TL;DR
This paper addresses the challenge of attributing fine-tuned language models to their base models by developing interrogation strategies and multiple response similarity measures, achieving effective model attribution under restricted API access.
Contribution
The paper introduces four novel approaches for model attribution using response similarity metrics, advancing the methodology for identifying model origins in a restricted API setting.
Findings
Response similarity metrics effectively distinguish models
Transformer-based classifiers improve attribution accuracy
Multiple approaches outperform baseline methods
Abstract
This paper presents solutions to the Machine Learning Model Attribution challenge (MLMAC) collectively organized by MITRE, Microsoft, Schmidt-Futures, Robust-Intelligence, Lincoln-Network, and Huggingface community. The challenge provides twelve open-sourced base versions of popular language models developed by well-known organizations and twelve fine-tuned language models for text generation. The names and architecture details of fine-tuned models were kept hidden, and participants can access these models only through the rest APIs developed by the organizers. Given these constraints, the goal of the contest is to identify which fine-tuned models originated from which base model. To solve this challenge, we have assumed that fine-tuned models and their corresponding base versions must share a similar vocabulary set with a matching syntactical writing style that resonates in their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Data Classification
MethodsMulti-Head Attention · Softmax · Layer Normalization · Adam · Linear Layer · Dense Connections · Residual Connection · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Label Smoothing
