FedAttr: Towards Privacy-preserving Client-Level Attribution in Federated LLM Fine-tuning
Su Zhang, Junfeng Guo, Heng Huang

TL;DR
FedAttr is a novel protocol for client-level attribution in federated learning that accurately identifies watermarked data training clients while preserving privacy and maintaining FL performance.
Contribution
It introduces FedAttr, a privacy-preserving attribution method that effectively detects watermarked data clients in federated LLM fine-tuning.
Findings
Achieves 100% true positive rate and 0% false positive rate in experiments.
Outperforms baseline methods by at least 44.4% in TPR or 19.1% in FPR.
Adds only 6.3% overhead to federated training time.
Abstract
Watermark radioactivity testing type of methods can detect whether a model was trained on watermarked documents, and have become key tools for protecting data ownership in the fine-tuning of large language models (LLMs). Existing works have proved their effectiveness in centralized LLM fine-tuning. However, this type of method faces several challenges and remains underexplored in federated learning (FL), a widely-applied paradigm for fine-tuning LLMs collaboratively on private data across different users. FL mainly ensures privacy through secure aggregation (SA), which allows the server to aggregate updates while keeping clients' updates private. This mechanism preserves privacy but makes it difficult to identify which client trained on watermarked documents. In this work, we propose FedAttr, a new client-level attribution protocol for FL. FedAttr identifies which clients trained on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
