Traceable Black-box Watermarks for Federated Learning

Jiahao Xu; Rui Hu; Olivera Kotevska; Zikai Zhang

arXiv:2505.13651·cs.CR·February 10, 2026

Traceable Black-box Watermarks for Federated Learning

Jiahao Xu, Rui Hu, Olivera Kotevska, Zikai Zhang

PDF

Open Access 3 Reviews

TL;DR

This paper introduces TraMark, a novel method for injecting traceable black-box watermarks into federated learning models, enabling model leakage verification without compromising main task performance.

Contribution

The paper formalizes the problem of traceable black-box watermarking in federated learning and proposes a server-side method, TraMark, to create uniquely watermarked models for each client.

Findings

01

TraMark ensures traceability of watermarked models.

02

Watermarked models maintain main task performance.

03

Effective in various federated learning systems.

Abstract

Due to the distributed nature of Federated Learning (FL) systems, each local client has access to the global model, which poses a critical risk of model leakage. Existing works have explored injecting watermarks into local models to enable intellectual property protection. However, these methods either focus on non-traceable watermarks or traceable but white-box watermarks. We identify a gap in the literature regarding the formal definition of traceable black-box watermarking and the formulation of the problem of injecting such watermarks into FL systems. In this work, we first formalize the problem of injecting traceable black-box watermarks into FL. Based on the problem, we propose a novel server-side watermarking method, $TraMark$ , which creates a traceable watermarked model for each client, enabling verification of model leakage in black-box settings. To achieve this,…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 5

Strengths

S1. TraMark partitions the model parameters to the main task and watermarking task regions. In this way, we can say that TraMark is model-agnostic. S2. The problem formulation and the insights in Section 2 are very clear. It is difficult to inject watermarks that satisfy black-box traceability while avoiding collusion, and I think the authors did a good job of formulating this difficulty.

Weaknesses

W1. Potential watermark detectability by malicious clients: Watermarked weights may be relatively easy to detect by malicious clients. They could analyze which parameters change more significantly or differently during training and identify trends in weight updates of received global models. Such differences, especially if certain parameters are updated disproportionately or the updates remain closer to zero, could reveal which weights are used to embed covert information (i.e., the watermark

Reviewer 02Rating 4Confidence 4

Strengths

A clear formalization of the traceable black-box watermarking problem is provided in Sec. 3. The experimental results demonstrate an excellent Verification Rate (VR) of approximately 99.17% with a limited drop in Main-task Accuracy (MA), especially when compared to the FedTracker method. The evaluation is conducted across multiple datasets, and the analyses of robustness, hyperparameters, and other factors are detailed.

Weaknesses

As indicated in Table 8 of the appendix, the per-round computational overhead for TraMark's aggregation is over 70 times that of FedAvg. The authors rationalize this by citing the small number of clients in cross-silo scenarios, a justification that is not entirely convincing and severely limits the method's scope of application. The paper appears to overlook the method's communication overhead. At the beginning of each training round, the server is required to send a unique, personalized model

Reviewer 03Rating 6Confidence 3

Strengths

- This paper has a clear motivation to achieve black-box traceability in FL. - This paper provides a comprehensive evaluation of various datasets and both IID and non-IID settings. - This paper is generally well-written and easy to follow.

Weaknesses

- Insufficient ablation study. To better justify the necessity of the parameter partitioning, it may be better for the authors to include a comparison against a simpler baseline that does not use this partitioning. This would help quantify the impact of watermark collisions or main task degradation that the partitioning scheme is designed to prevent. - Inadequate evaluation of computational overhead. The proposed method requires the server to perform personalized watermark injection (fine-tuning

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Privacy-Preserving Technologies in Data · Blockchain Technology Applications and Security

MethodsFocus