ProToken: Token-Level Attribution for Federated Large Language Models

Waris Gill; Ahmad Humayun; Ali Anwar; Muhammad Ali Gulzar

arXiv:2601.19672·cs.LG·January 29, 2026

ProToken: Token-Level Attribution for Federated Large Language Models

Waris Gill, Ahmad Humayun, Ali Anwar, Muhammad Ali Gulzar

PDF

Open Access

TL;DR

ProToken introduces a method for token-level attribution in federated LLMs, enabling identification of client contributions during text generation while preserving privacy, with high accuracy across multiple models and domains.

Contribution

It presents ProToken, a novel provenance methodology that achieves accurate client attribution at the token level in federated LLMs without compromising privacy.

Findings

01

Achieves 98% average attribution accuracy.

02

Maintains high accuracy with increasing number of clients.

03

Effective across diverse LLM architectures and domains.

Abstract

Federated Learning (FL) enables collaborative training of Large Language Models (LLMs) across distributed data sources while preserving privacy. However, when federated LLMs are deployed in critical applications, it remains unclear which client(s) contributed to specific generated responses, hindering debugging, malicious client identification, fair reward allocation, and trust verification. We present ProToken, a novel Provenance methodology for Token-level attribution in federated LLMs that addresses client attribution during autoregressive text generation while maintaining FL privacy constraints. ProToken leverages two key insights to enable provenance at each token: (1) transformer architectures concentrate task-specific signals in later blocks, enabling strategic layer selection for computational tractability, and (2) gradient-based relevance weighting filters out irrelevant neural…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Artificial Intelligence in Healthcare and Education · Advanced Graph Neural Networks