AttnDiff: Attention-based Differential Fingerprinting for Large Language Models

Haobo Zhang; Zhenhua Xu; Junxian Li; Shangfeng Sheng; Dezhang Kong; Meng Han

arXiv:2604.05502·cs.CR·April 8, 2026

AttnDiff: Attention-based Differential Fingerprinting for Large Language Models

Haobo Zhang, Zhenhua Xu, Junxian Li, Shangfeng Sheng, Dezhang Kong, Meng Han

PDF

TL;DR

AttnDiff is a white-box fingerprinting method that verifies model provenance by analyzing differential attention patterns, effective across various model modifications and open-source LLM families.

Contribution

It introduces a data-efficient framework that captures intrinsic attention-based fingerprints for large language models, enabling provenance verification despite common laundering techniques.

Findings

01

High similarity scores (>0.98) for related derivatives across multiple models.

02

Effective separation of unrelated models with low similarity (<0.22).

03

Supports practical provenance verification with as few as 5 probes.

Abstract

Protecting the intellectual property of open-weight large language models (LLMs) requires verifying whether a suspect model is derived from a victim model despite common laundering operations such as fine-tuning (including PPO/DPO), pruning/compression, and model merging. We propose \textsc{AttnDiff}, a data-efficient white-box framework that extracts fingerprints from models via intrinsic information-routing behavior. \textsc{AttnDiff} probes minimally edited prompt pairs that induce controlled semantic conflicts, captures differential attention patterns, summarizes them with compact spectral descriptors, and compares models using CKA. Across Llama-2/3 and Qwen2.5 (3B--14B) and additional open-source families, it yields high similarity for related derivatives while separating unrelated model families (e.g., $> 0.98$ vs.\ $< 0.22$ with $M = 60$ probes). With 5--60 multi-domain probes, it…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.