Kick Bad Guys Out! Conditionally Activated Anomaly Detection in Federated Learning with Zero-Knowledge Proof Verification
Shanshan Han, Wenxuan Wu, Baturalp Buyukates, Weizhao Jin, Qifan Zhang, Yuhang Yao, Salman Avestimehr, Chaoyang He

TL;DR
RedJasper is a two-stage anomaly detection system for federated learning that effectively identifies malicious models without disrupting benign training, using zero-knowledge proofs for transparency.
Contribution
It introduces a practical, zero-knowledge proof-enabled anomaly detection framework for federated learning, addressing real-world deployment challenges.
Findings
High accuracy in detecting malicious models
Maintains performance comparable to benign scenarios
Operates without unrealistic assumptions
Abstract
Federated Learning (FL) systems are susceptible to adversarial attacks, such as model poisoning attacks and backdoor attacks. Existing defense mechanisms face critical limitations in real-world deployments, such as relying on impractical assumptions (e.g., adversaries acknowledging the presence of attacks before attacking) or undermining accuracy in model training, even in benign scenarios. To address these challenges, we propose RedJasper, a two-staged anomaly detection method specifically designed for real-world FL deployments. It identifies suspicious activities in the first stage, then activates the second stage conditionally to further scrutinize the suspicious local models, employing the 3{\sigma} rule to identify real malicious local models and filtering them out from FL training. To ensure integrity and transparency within the FL system, RedJasper integrates zero-knowledge…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
+ The approach to mitigate the attacks in two stages, one lightweight check to flag possible attacks, before going into a deeper analysis is interesting to reduce the computational burden in federated learning training, especially as in practice attacks may not occur very often. + The paper is well-organized and structured, with formal descriptions for the algorithms and an attempt to provide a theoretical bound on false positives. + Compared to other defenses against poisoning attacks in fede
+ Some of the assumptions are not realistic in practical federated learning settings. Thus, theorem 3.2 and the use of the 3-sigma rule rely entirely on the assumption that the “evilness scores” follow a Gaussian distribution, which does not hold in FL with non-IID local datasets or in adversarial settings: client updates can be multi-modal, heavy-tailed, and highly non-IID, especially in the presence of attackers. Consequently, the theoretical result has limited practical relevance and provides
- It correctly identifies a key practical flaw in many defenses: being "always on" (perpetually active) is suboptimal, as it can degrade model performance and in benign, attack-free scenarios. The conditional activation architecture is an elegant solution to this problem. The two-stage design is logical and clearly illustrated in Figure 1, making the proposed architecture easy to understand. Figures 1 and 3 collectively show how readily deployable the solution is, which is impressive. - The defe
- The paper emphasizes how previous defenses are based on unrealistic assumptions about the number of malicious clients. However, this defense just swaps one assumption for another. Getting rid of the assumption on the number of malicious clients makes it necessary for the protocol to rely on several thresholds, namely the $3\sigma$ rule and the cosine similarity threshold $\gamma$. These thresholds seem harder and less intuitive to set than a single assumption on the maximum number of maliciou
1.The use of ZKP to verify the server-side defense process is an innovative and important contribution, which strives to address the client-server trust problem. 2.This paper is well written and has an intuitive comparison between existing methods.
1.The novelty of the work is insufficient since using anomaly detection to identify malicious models is a very common practice. 2.The first stage's reliance on cosine similarity appears potentially vulnerable to advanced adaptive attacks (e.g., A3FL [1], Chameleon [2]). Attackers could craft models to maintain high similarity, bypassing the initial detection and rendering the entire two-stage defense ineffective. The paper's discussion of this risk is not fully convincing. 3.The evaluation f
1. The paper proposes a practical design without requiring prior knowledge of attacks or attackers. 2. The proposed method integrates verification on the server's detection process via ZKPs, enhancing transparency and trust.
1. The integration of ZKPs introduces significant additional overhead, which limits the applicability of the proposed method. 2. The technical contribution is limited since the detection methods have been well-established by existing work. The integration of ZKPs is also not novel. 3. The number of clients is small in the experiments. The evaluation is also limited to two datasets.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Network Security and Intrusion Detection · Anomaly Detection Techniques and Applications
