FedThief: Harming Others to Benefit Oneself in Self-Centered Federated Learning

Xiangyu Zhang; Mang Ye

arXiv:2509.00540·cs.LG·September 3, 2025

FedThief: Harming Others to Benefit Oneself in Self-Centered Federated Learning

Xiangyu Zhang, Mang Ye

PDF

Open Access

TL;DR

FedThief introduces a novel attack in federated learning where malicious participants degrade the global model while simultaneously improving their own models using divergence-aware ensemble techniques, highlighting new security challenges.

Contribution

This paper presents FedThief, a new attack framework that enables attackers to harm the global model and enhance their private models simultaneously in federated learning.

Findings

01

FedThief effectively degrades global model performance.

02

Attacker's private models outperform the global model.

03

Divergence-aware ensemble techniques improve attacker gains.

Abstract

In federated learning, participants' uploaded model updates cannot be directly verified, leaving the system vulnerable to malicious attacks. Existing attack strategies have adversaries upload tampered model updates to degrade the global model's performance. However, attackers also degrade their own private models, gaining no advantage. In real-world scenarios, attackers are driven by self-centered motives: their goal is to gain a competitive advantage by developing a model that outperforms those of other participants, not merely to cause disruption. In this paper, we study a novel Self-Centered Federated Learning (SCFL) attack paradigm, in which attackers not only degrade the performance of the global model through attacks but also enhance their own models within the federated learning process. We propose a framework named FedThief, which degrades the performance of the global model by…

Tables6

Table 1. TABLE I: Summary of Notation

Symbol	Description
$N$	Total number of clients
$𝒦, 𝒦_{m}$	All clients; subset of malicious clients
$C$	Number of classes
$D_{k}^{train}, D_{k}^{val}$	Training and validation splits of client $c_{k}$
$x_{k}^{i}, y_{k}^{i}$	$i$ -th input and one-hot label of client $c_{k}$
$θ_{p}, θ_{m}, θ_{e}$	Parameters of private, malicious, and error models
$g_{p}^{t}, g_{m}^{t}$	Private and malicious gradients at round $t$
${\tilde{g}}_{m}^{t}$	Perturbed gradient after Byzantine attack
$𝒜 (\cdot)$	Byzantine attack function
$δ, β$	Perturbation vector and its magnitude
$g^{t}$	Global aggregated gradient at round $t$
$η, λ$	Learning rate; loss trade-off weight
$ℒ_{CE}, ℒ_{KD}$	Cross-entropy and KL divergence losses
$ℰ_{k}^{t}, ℒ^{t}$	Local ensemble head; global ensemble model

Table 2. TABLE II: The accuracy rates of the global model for benign clients ( A c c g Acc_{g} ) and the ensemble model for malicious clients ( A c c e Acc_{e} ) under a range of adversarial attacks and defensive strategies. α \alpha represents the proportion of malicious clients.

			Attack Method
			LIE [49]				Min-Sum [32]				FedGhost [50]
			$α = 0.2$		$α = 0.4$		$α = 0.2$		$α = 0.4$		$α = 0.2$		$α = 0.4$
Dataset (Model)	Aggregate Methods	No Attack ${\tilde{A c c}}_{g}$	$A c c_{g}$	$A c c_{e} (Δ_{m a l})$	$A c c_{g}$	$A c c_{e} (Δ_{m a l})$	$A c c_{g}$	$A c c_{e} (Δ_{m a l})$	$A c c_{g}$	$A c c_{e} (Δ_{m a l})$	$A c c_{g}$	$A c c_{e} (Δ_{m a l})$	$A c c_{g}$	$A c c_{e} (Δ_{m a l})$
	FedAvg[14]	98.60	98.17	98.42(+0.25)	97.73	98.12(+0.39)	98.43	98.57(+0.14)	97.96	98.42(+0.46)	98.48	98.50(+0.02)	98.33	98.44(+0.11)
	FedProx[59]	98.62	98.20	98.44(+0.24)	97.76	98.16(+0.40)	98.22	98.25(+0.03)	98.07	98.18(+0.11)	98.51	98.51(+0)	98.34	98.44(+0.10)
	Bulyan[54]	98.64	98.01	97.97(-0.04)	91.97	97.98(+6.01)	98.46	98.13(-0.33)	93.94	98.08(+4.14 )	98.01	97.93(-0.08)	97.51	97.61(+0.10)
	Multi-Krum[51]	98.60	98.13	98.30(+0.17)	96.11	98.06(+1.95)	98.42	98.53(+0.11)	96.44	98.08(+1.64)	98.46	98.28(-0.18)	98.08	98.09(+0.01)
	Trimmed-mean[53]	98.58	98.14	98.20(+0.06)	97.05	98.06(+1.01)	98.52	98.13(-0.39)	97.65	98.05(+0.40)	98.54	98.45(-0.09)	98.26	98.32(+0.06)
MNIST (CNN)	Median[53]	98.56	98.16	98.44(+0.28)	96.21	98.07(+1.86)	98.16	98.36(+0.2)	97.78	98.02(+0.24)	97.75	97.80(+0.05)	97.61	97.75(+0.14)
	FedAvg[14]	85.67	84.87	85.11(+0.24)	84.14	84.87(+0.73)	84.77	84.92(+0.15)	84.01	84.92(+0.91)	85.25	85.25(+0)	85.11	85.18(+0.07)
	FedProx[59]	85.62	84.86	85.04(+0.18)	84.11	84.93(+0.82)	84.84	84.96(+0.12)	84.76	84.92(+0.16)	85.26	85.30(+0.04)	85.06	85.15(+0.09)
	Bulyan[54]	85.56	84.57	84.61(+0.04)	78.36	84.08(+5.72)	84.38	84.60(+0.22)	84.07	84.45(+0.38)	84.45	84.30(-0.15)	84.09	84.22(+0.13)
	Multi-Krum[51]	85.68	84.82	85.25(+0.43)	81.05	84.21(+3.16)	84.69	85.28(+0.59)	82.26	84.47(+2.21)	84.64	84.34(-0.30)	84.21	84.48(+0.27)
	Trimmed-mean[53]	85.67	84.64	85.32(+0.68)	83.09	84.5(+1.41)	84.54	85.30(+0.76)	83.73	84.77(+1.04)	84.55	84.60(+0.05)	84.41	84.48(+0.07)
FASHION (CNN)	Median[53]	85.06	84.75	85.32(+0.57)	82.68	84.57(+1.89)	84.16	84.36(+0.20)	84.71	85.21(+0.50)	84.22	84.46(+0.24)	83.41	84.08(+0.67)
	FedAvg[14]	61.28	50.79	53.05(+2.26)	41.43	52.24(+10.81)	54.90	57.55(+2.65)	38.84	55.99(+17.15)	55.19	56.05(+0.86)	54.40	56.47(+2.07)
	FedProx[59]	61.50	51.21	53.21(+2.00)	42.04	52.16(+10.12)	52.16	54.03(+1.87)	35.40	56.96(+21.56)	56.00	56.83(+0.83)	55.01	57.1(+2.09)
	Bulyan[54]	59.42	46.27	50.53(+4.26)	16.23	51.61(+35.38)	52.96	53.10(+0.14)	22.18	53.65(+31.47)	54.06	54.03(-0.03)	49.52	53.37(+3.85)
	Multi-Krum[51]	56.17	49.56	52.52(+2.96)	30.06	51.78(+21.72)	50.02	50.18(+0.16)	34.58	51.84(+17.26)	53.09	53.61(+0.52)	51.05	53.01(+1.96)
	Trimmed-mean[53]	56.27	46.67	49.70(+3.03)	35.66	51.88(+16.22)	48.21	52.25(+4.04)	30.09	50.93(+20.84)	52.88	53.27(+0.39)	51.55	52.80(+1.25)
CIFAR-10 (Alexnet)	Median[53]	55.25	49.43	50.90(+1.47)	31.64	51.69(+20.05)	47.94	51.42(+3.48)	29.19	49.02(+19.83)	51.08	52.03(+0.95)	43.94	51.36(+7.42)

Table 3. TABLE III: Performance of FedThief under data poisoning attacks with Bulyan aggregation ( α = 0.4 \alpha=0.4 ). Each entry shows global and ensemble accuracy, respectively.

	Symmetry Flipping		Pair Flipping
Dataset	$A c c_{g}$	$A c c_{e} (Δ_{m a l})$	$A c c_{g}$	$A c c_{e} (Δ_{m a l})$
MNIST	97.28	98.05(+0.77)	51.50	98.06(+46.56)
FASHION	81.44	84.43(+2.99)	41.84	83.69(+41.85)
CIFAR10	39.75	49.69(+9.94)	26.12	49.42(+23.30)

Table 4. TABLE IV: Impact of the private dataset partition ratio v v , evaluated on CIFAR-10 with α = 0.2 \alpha=0.2 . Attack: LIE. Aggregators: Bulyan and Multi-Krum.

	FedBulyan			Multi-Krum
	$v = 2$	$v = 5$	$v = 10$	$v = 2$	$v = 5$	$v = 10$
$A c c_{g}$	49.09	46.27	47.01	51.57	49.56	49.83
$A c c_{e}$	50.92	50.53	50.79	53.39	52.52	51.79
$Δ_{mal}$	+1.83	+4.26	+3.78	+1.82	+2.96	+1.96

Table 5. TABLE V: Accuracy of the ensemble model when including each component. ✓ means the component is included. Min-Sum is used under FedAvg aggregation.

			MNIST		FASHION		CIFAR10
$θ_{p}$	$θ_{m}$	$θ_{e}$	$α$ $=$ $0.2$	$α$ $=$ $0.4$	$α$ $=$ $0.2$	$α$ $=$ $0.4$	$α$ $=$ $0.2$	$α$ $=$ $0.4$
✓			98.30	98.03	82.85	83.11	49.27	50.18
✓	✓		98.43	98.06	83.61	83.17	54.62	52.99
✓		✓	98.16	97.85	81.49	82.01	52.28	53.06
✓	✓	✓	98.57	98.42	84.92	84.92	57.55	55.99

Table 6. TABLE VI: Ensemble accuracy under different distillation temperatures on CIFAR-10 (MinSum, FedAvg, α = 0.2 \alpha=0.2 ).

Temperature $T$	1	2	3	5
Accuracy (%)	55.12	57.30	57.55	57.53

Equations50

\tilde{g}_{m} = g_{m} + z, z_{i} \sim N (0, α σ_{i}) .

\tilde{g}_{m} = g_{m} + z, z_{i} \sim N (0, α σ_{i}) .

j = 1 \sum N ∥ \tilde{g}_{m} - g_{j} ∥^{2} \leq τ,

j = 1 \sum N ∥ \tilde{g}_{m} - g_{j} ∥^{2} \leq τ,

score (g_{i}) = j \in N_{i} \sum ∥ g_{i} - g_{j} ∥_{2}^{2},

score (g_{i}) = j \in N_{i} \sum ∥ g_{i} - g_{j} ∥_{2}^{2},

g_{multi} = \frac{1}{m} i \in S_{m} \sum g_{i} .

g_{multi} = \frac{1}{m} i \in S_{m} \sum g_{i} .

cos (g_{i}, g_{j}) = \frac{⟨ g _{i} , g _{j} ⟩}{∥ g _{i} ∥ _{2} ∥ g _{j} ∥ _{2}},

cos (g_{i}, g_{j}) = \frac{⟨ g _{i} , g _{j} ⟩}{∥ g _{i} ∥ _{2} ∥ g _{j} ∥ _{2}},

w_{i} \propto 1 - j \neq = i max cos (g_{i}, g_{j}), g_{agg} = i = 1 \sum N w_{i} g_{i} .

w_{i} \propto 1 - j \neq = i max cos (g_{i}, g_{j}), g_{agg} = i = 1 \sum N w_{i} g_{i} .

[g_{trim}]_{d} = \frac{1}{N - 2 f} i = f + 1 \sum N - f g_{(i), d},

[g_{trim}]_{d} = \frac{1}{N - 2 f} i = f + 1 \sum N - f g_{(i), d},

[g_{med}]_{d} = median {g_{1, d}, \dots, g_{N, d}} .

[g_{med}]_{d} = median {g_{1, d}, \dots, g_{N, d}} .

g_{ref} = \nabla ℓ (θ; D_{proxy}),

g_{ref} = \nabla ℓ (θ; D_{proxy}),

α_{i} = max (0, cos (g_{i}, g_{ref})), g_{agg} = \frac{\sum _{i} α _{i} g _{i}}{\sum _{i} α _{i}} .

α_{i} = max (0, cos (g_{i}, g_{ref})), g_{agg} = \frac{\sum _{i} α _{i} g _{i}}{\sum _{i} α _{i}} .

f (θ_{k}) : R^{d_{in}} \to R^{C} .

f (θ_{k}) : R^{d_{in}} \to R^{C} .

g_{m}^{t} = \nabla f (θ_{m}^{t}; D_{k}^{train}) .

g_{m}^{t} = \nabla f (θ_{m}^{t}; D_{k}^{train}) .

\tilde{g}_{m}^{t} = A (g_{m}^{t}) = g_{m}^{t} + β \cdot δ,

\tilde{g}_{m}^{t} = A (g_{m}^{t}) = g_{m}^{t} + β \cdot δ,

g^{t} = Aggregate ({\tilde{g}_{m}^{t}} \cup {g_{benign}^{t}}) .

g^{t} = Aggregate ({\tilde{g}_{m}^{t}} \cup {g_{benign}^{t}}) .

θ_{m}^{t + 1} = θ_{m}^{t} - η g^{t} .

θ_{m}^{t + 1} = θ_{m}^{t} - η g^{t} .

g_{p}^{t} = \nabla f (θ_{p}^{t}; D_{k}^{train}) .

g_{p}^{t} = \nabla f (θ_{p}^{t}; D_{k}^{train}) .

θ_{p}^{t + 1} = θ_{p}^{t} - η \cdot \frac{1}{∣ K _{m} ∣} k \in K_{m} \sum g_{p, k}^{t} .

θ_{p}^{t + 1} = θ_{p}^{t} - η \cdot \frac{1}{∣ K _{m} ∣} k \in K_{m} \sum g_{p, k}^{t} .

θ_{e}^{t + 1} = θ_{e}^{t} - η \cdot \tilde{g}_{m}^{t} .

θ_{e}^{t + 1} = θ_{e}^{t} - η \cdot \tilde{g}_{m}^{t} .

E min x \in D_{k}^{val} \sum L (z_{m}, z_{p}, z_{e}),

E min x \in D_{k}^{val} \sum L (z_{m}, z_{p}, z_{e}),

L^{t + 1} = \frac{1}{∣ K _{m} ∣} k \in K_{m} \sum E_{k}^{t},

L^{t + 1} = \frac{1}{∣ K _{m} ∣} k \in K_{m} \sum E_{k}^{t},

L_{CE} = - i = 1 \sum C y^{i} lo g f (θ_{p}^{t + 1}),

L_{CE} = - i = 1 \sum C y^{i} lo g f (θ_{p}^{t + 1}),

L_{KD} = i \sum P_{i} lo g \frac{P _{i}}{Q _{i}},

L_{KD} = i \sum P_{i} lo g \frac{P _{i}}{Q _{i}},

L_{total} = λ \cdot L_{CE} + (1 - λ) \cdot L_{KD},

L_{total} = λ \cdot L_{CE} + (1 - λ) \cdot L_{KD},

θ_{p}^{t + 1} = θ_{p}^{t + 1} - η \cdot \nabla L_{total},

θ_{p}^{t + 1} = θ_{p}^{t + 1} - η \cdot \nabla L_{total},

Δ_{mal} = A c c_{e} - A c c_{g},

Δ_{mal} = A c c_{e} - A c c_{g},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning

Full text

FedThief: Harming Others to Benefit Oneself in Self-Centered Federated Learning

Xiangyu Zhang, Mang Ye, Xiangyu Zhang is with Key Laboratory of Aerospace Information Security and Trusted Computing, Ministry of Education, School of Cyber Science and Engineering, Wuhan University, Wuhan 430072, China (e-mail: [email protected]).Mang Ye is with the School of Computer Science, Taikang Center for Life and Medical Sciences, Wuhan University, Wuhan 430071, China (e-mail: [email protected]). Corresponding author: Mang Ye

Abstract

In federated learning, participants’ uploaded model updates cannot be directly verified, leaving the system vulnerable to malicious attacks. Existing attack strategies have adversaries upload tampered model updates to degrade the global model’s performance. However, attackers also degrade their own private models, gaining no advantage. In real-world scenarios, attackers are driven by self-centered motives: their goal is to gain a competitive advantage by developing a model that outperforms those of other participants, not merely to cause disruption. In this paper, we study a novel Self-Centered Federated Learning (SCFL) attack paradigm, in which attackers not only degrade the performance of the global model through attacks but also enhance their own models within the federated learning process. We propose a framework named FedThief, which degrades the performance of the global model by uploading modified content during the upload stage. At the same time, it enhances the private model’s performance through divergence-aware ensemble techniques—where “divergence” quantifies the deviation between private and global models—that integrate global updates and local knowledge. Extensive experiments show that our method effectively degrades the global model performance while allowing the attacker to obtain an ensemble model that significantly outperforms the global model.

Index Terms:

Federated Learning, Self-Centered, Byzantine Attack.

I Introduction

In the field of machine learning, the quality and diversity of the training data are widely recognized as essential prerequisites for enabling models to generalize effectively to unseen data and perform reliably across a range of downstream tasks [1, 2]. These characteristics directly influence the learned model’s empirical risk minimization, hypothesis space coverage, and robustness to distributional shifts [3, 4]. However, in many real-world scenarios, collecting such high-quality, heterogeneous data poses formidable challenges. Specifically, data acquisition is often constrained by numerous practical and ethical considerations, including cost constraints, institutional barriers, and data privacy concerns [5, 6].

This dilemma is particularly acute for individual-level organizations or isolated entities, who may lack the necessary resources or legal capacity to aggregate sufficient quantities of useful training data[7]. Furthermore, many domain-specific datasets—especially those involving personally identifiable information (PII), medical records, or financial transactions—contain highly sensitive or confidential data [8, 9]. These characteristics impose stringent regulations on data sharing, governed by laws such as GDPR [10], HIPAA [11], or regional data compliance mandates. As a result, naively centralizing data for joint model training becomes both legally and technically infeasible [12, 13].

To address these challenges, Federated Learning (FL) has emerged as a decentralized privacy-preserving machine learning paradigm that facilitates collaborative training among multiple distributed data owners without requiring raw data exchange [14, 15, 16]. FL allows all participants—commonly referred to as clients—to train a shared model collaboratively, while only sharing intermediate model updates such as gradients or parameters computed over local datasets [17, 18]. This paradigm significantly alleviates privacy and security risks, and offers a promising alternative for data-isolated systems where collaborative modeling is desirable but direct information interchange is prohibited [19].

Within the federated learning (FL) framework, participating entities, referred to as clients, independently maintain their own datasets and locally train private models on this proprietary information [20]. To ensure data locality and minimize privacy leakage, local model training is executed in isolation, and only derived statistics—such as parameter updates or gradients—are periodically shared with a central server [21]. This server, in turn, employs a predefined aggregation strategy (e.g., Federated Averaging[14]) to synthesize a global model that integrates knowledge from all participating clients. The updated global model is then broadcast back to the clients for further fine-tuning on their respective local datasets [22, 23]. This iterative optimization process is repeated over multiple communication rounds until the global model converges to satisfactory performance [24]. Throughout this process, raw data never leaves the client’s device or administrative domain, thereby offering a strong inherent privacy guarantee [25] and making FL fundamentally suitable for privacy-preserving machine learning, particularly in sensitive application scenarios, as illustrated in Fig. 1(A).

Despite the privacy-preserving nature of federated learning (FL), the indirect exchange of information—via intermediate model updates rather than raw data—introduces significant security vulnerabilities[26, 27]. Specifically, since the central server lacks direct access to client-side data, it cannot reliably verify whether the received updates are legitimate or adversarial [28, 29]. In many practical FL deployments—particularly those spanning business alliances, consortia, or industrial coalitions—clients may act as both collaborators and competitors [30]. As such, the existence of strategically malicious participants is not merely a theoretical concern, but a tangible and pervasive threat that can severely compromise collaborative model quality, consistency, and fairness.

Existing literature confirms that malicious clients can conduct Byzantine attacks, a class of adversarial behavior in which compromised clients upload erroneous, misleading, or adversarial updates to the server during aggregation [31, 30, 32, 33, 34]. These attacks degrade the performance of the global model, reduce convergence speed, or destabilize the training process. However, conventional Byzantine attack frameworks largely emphasize disrupting collaboration by reducing the overall utility of the system without necessarily enabling tangible benefits for the attackers themselves[35, 36]. In other words, malicious participants incur the cost of training and communication in order to sabotage the global model, but such sabotage often leads to degraded performance of their own private models as well—see Figure 1(B).

Therefore, conventional attack designs that solely focus on degrading the global model fail to reflect a crucial aspect of real-world adversarial behavior: the self-serving nature of malicious participants. In most practical threat models, attackers are utility-maximizing rational agents whose primary objective is not merely to harm others, but to establish a competitive advantage by securing superior model performance for themselves. For example, in financial or medical domains, gaining access to a better model than other participants can result in direct economic or strategic incentives[37, 38]. This observation leads us to a critical insight: a significant discrepancy exists between existing attack models and the self-interested motivations observed in practical settings. Thus, it is essential to explore alternative attack paradigms that explicitly couple harm to others with benefit to self.

In this paper, we propose a novel attack paradigm termed Self-Centered Federated Learning (SCFL). In SCFL, the attacker’s objective shifts from mere sabotage to gain: malicious clients covertly participate in the federated learning process and exploit it to develop private models that substantially outperform the globally aggregated model shared among all participants—Figure 1(C). This is accomplished through a two-phase strategy. In the first phase, during local training, attackers perform carefully designed Byzantine-style manipulations to upload distorted updates that lower the quality of the global model. In the second phase, after receiving the aggregated global updates, attackers leverage their prior knowledge of the previously-injected malicious components to strategically reverse their effects through correction or filtering, thereby selectively extracting useful information while sustaining performance gains for their private models.

However, implementing the SCFL paradigm is associated with several unique and practical challenges. Firstly, the server’s aggregation mechanism is typically unknown or non-transparent to clients, making it difficult to explicitly quantify adversarial contributions or reliably anticipate the exact form of the aggregated global model. Moreover, in order to craft a superior private model, adversarial clients must intentionally diverge from the optimization trajectory of the global model during local training. As a result, the cumulative divergence between the private and global models tends to increase steadily over communication rounds, potentially amplifying model inconsistencies. This growing divergence raises two tightly coupled challenges.

On one hand, a widening divergence increases the detectability of manipulated updates. That is, as the attacker’s model deviates further in representation space from the global consensus, the server may flag these updates as anomalous or nonconforming, triggering potential countermeasures. On the other hand, increasing divergence reduces the utility of subsequent global updates distributed by the server, making them no longer directly applicable for enhancing the private model. As such, attackers face the dual challenge of remaining stealthy while optimizing for their own performance. Hence, designing a practical and robust mechanism that enables malicious agents to continually train superior private models while remaining undetected constitutes a core technical barrier.

To address these technical difficulties, we propose FedThief—a general framework for constructing SCFL attacks. The design of FedThief is centered on a dual-model architecture, accommodating both attack execution and self-benefit extraction. Specifically, malicious clients simultaneously maintain and train:

A malicious model that stays aligned with the global model’s training trajectory. This model serves to fabricate adversarial updates for submission to the server, ensuring that the attacker avoids excessive divergence and maintains stealth; An ensemble-based private model [39, 40, 41] that leverages divergence-aware optimization strategies. This model integrates three sources of knowledge: (i) the attacker’s own private update direction; (ii) feedback from global model updates; and (iii) synthesized corrections for previously-injected adversarial components. This ensemble boosts performance while maintaining a camouflage effect. We summarize our main contributions as follows:

• We propose Self-Centered Federated Learning (SCFL), a new and practically motivated federated attack paradigm that prioritizes both harming global model integrity and boosting local model performance for malicious clients.

• We introduce FedThief, a novel dual-model attack strategy that enables malicious clients to balance stealth and private gains via divergence-aware ensemble learning, deviating from the global model just enough to remain undetected while consistently maximizing private performance.

• We conduct comprehensive experimental evaluations on multiple benchmarks, which demonstrate that malicious clients using FedThief can consistently obtain private models that outperform the global model in both accuracy and convergence speed, thus empirically validating the practical threat of SCFL in realistic federated learning deployments.

II Related Work

II-A Byzantine Attacks in Federated Learning

Byzantine attacks in federated learning (FL) aim to subvert the global model either by corrupting local data or directly manipulating model updates submitted to the server. These attacks are generally categorized into two types: data-poisoning attacks and model- (or gradient-) poisoning attacks.

Data-Poisoning Attacks

In data-poisoning attacks, adversarial clients poison their local training data with the intent of degrading global model performance on benign inputs. Early work by Van et al. [42] introduces Symmetry Flipping, in which class labels are randomly reassigned with equal probability, thereby injecting label noise that disrupts convergence. Han et al. [43] propose Pair Flipping, where labels are replaced with semantically similar classes to induce targeted misclassification. More recently, Liu et al. [44] develop BadSampler, a clean-label attack method in which malicious clients selectively sample high-loss data points during local training, steering the model toward higher generalization error without altering any ground-truth labels.

A notable subclass of data-poisoning is backdoor attacks, which preserve benign performance while embedding malicious behavior triggered by specific inputs. Sun et al. [45] and Zhang et al. [46] provide surveys of backdoor injection techniques in FL. Bagdasaryan et al. [47] demonstrate that such attacks can succeed even under robust aggregation. Xie et al. [48] propose DBA, which dynamically adjusts trigger strength across clients to enhance stealth and effectiveness.

Model-Poisoning Attacks

In model-poisoning attacks, adversaries craft malicious updates to manipulate the aggregation result at the server. Baruch et al. [49] introduce the LIE attack, where the gradient $g_{m}$ from each malicious client is perturbed by random noise drawn from a Gaussian distribution proportional to the per-coordinate standard deviation $\sigma$ of the benign gradients:

[TABLE]

This low-variance perturbation allows malicious contributions to bypass defenses such as Trimmed-Mean and Median. Shejwalkar and Houmansadr [32] propose Min-Sum, where the manipulated gradient satisfies

[TABLE]

ensuring that the adversarial update remains within a Euclidean ball centered around the benign client cluster. More recently, Ma et al. [50] introduce FedGhost, which dynamically estimates the adversarial contribution in server-side aggregation and adaptively adjusts the attack magnitude in real-time, thereby maximizing disruption while effectively evading detection by standard statistical filters.

II-B Defense Strategies Against Byzantine Attacks

To mitigate the impact of Byzantine attacks, a variety of defense strategies have been developed. These include distance-based filtering, statistically robust aggregation, and the use of proxy datasets at the server. We categorize existing methods into three groups: distance-based defenses, statistical aggregation defenses, and proxy-dataset-based defenses.

II-B1 Distance-Based Defenses

These methods identify and suppress potentially malicious updates by quantifying their deviation from the majority consensus.

Krum and Multi-Krum

Blanchard et al. [51] propose Krum, which selects the update with the minimal sum of squared distances to its $N-f-2$ nearest neighbors:

[TABLE]

where $\mathcal{N}_{i}$ denotes the set of nearest gradients. Multi-Krum generalizes this by selecting the top- $m$ lowest-score updates and averaging them:

[TABLE]

FoolsGold

Fung et al. [52] address coordinated collusion among malicious clients by computing pairwise cosine similarity between updates,

[TABLE]

and down-weighting clients with highly similar gradients:

[TABLE]

II-B2 Statistical Aggregation Defenses

These defenses use coordinate-wise robust statistics to suppress outliers.

Trimmed-Mean

Yin et al. [53] propose Trimmed-Mean, which discards the top $f$ and bottom $f$ values in each dimension and averages the remaining:

[TABLE]

where $g_{(i),d}$ denotes the $i$ -th order statistic for dimension $d$ .

Median

A simpler yet effective method applies the coordinate-wise median aggregation rule:

[TABLE]

Bulyan

Guerraoui et al. [54] introduce Bulyan, which combines Multi-Krum and robust statistics. It first selects $2f+2$ gradients via Multi-Krum, and then applies Trimmed-Mean or Median aggregation rule within this subset to compute the final aggregated update $g_{\mathrm{bulyan}}$ .

II-B3 Proxy-Dataset Defenses

This class of methods assumes access to a small, trusted dataset residing at the server.

FLTrust

Cao et al. [55] propose FLTrust, where the server maintains a clean proxy dataset $D_{\mathrm{proxy}}$ to compute a reference gradient:

[TABLE]

and uses cosine similarity to reweight each client’s update:

[TABLE]

Clients with updates deviating from the server’s reference receive lower weights.

III Proposed Method

In this section, we first present the overall framework and then describe each core component in detail. The notations used throughout this section are summarized in Table I.

III-A Overview of FedThief

We consider a federated learning system designed for a $C$ -class image classification task, consisting of $N$ clients and a central server. Among these clients, an $\alpha$ fraction of the clients is controlled by adversaries, referred to as malicious clients. Each client $c_{k}$ maintains a local private dataset $D_{k}=\{(x_{k}^{i},y_{k}^{i})\}_{i=1}^{N_{k}}$ , where $x_{k}^{i}\in\mathbb{R}^{d_{\text{in}}}$ represents an input sample with $d_{\text{in}}$ feature dimensions, and $y_{k}^{i}\in\{0,1\}^{C}$ denotes the corresponding one-hot encoded label vector. The dataset size is given by $|D_{k}|=N_{k}$ .

Each client trains a local model $f(\theta_{k})$ , parameterized by $\theta_{k}\in\mathbb{R}^{d_{\theta}}$ , which maps input samples to class predictions:

[TABLE]

Federated learning proceeds in an iterative manner, where clients upload local updates and the server aggregates these updates and redistributes the resulting global updates to synchronize the local models of the clients. Benign clients contribute model updates based on local training, while malicious clients deliberately craft manipulated updates to disrupt the global learning process.

As shown in Figure 2, in FedThief, each malicious client independently maintains four local models:

Private model $\theta_{p}$ : optimized for local accuracy.
Malicious model $\theta_{m}$ : used to craft adversarial updates.
Error model $\theta_{e}$ : used to evaluate attack degradation.
Ensemble model $\mathcal{L}$ : the final output model of the malicious client, which integrates the outputs of the private, malicious, and error models through a linear regression model $\mathcal{E}_{k}^{t}$ .

To facilitate the evaluation of the ensemble model’s performance, a certain proportion $\frac{1}{v}$ of the local dataset $D_{k}$ is extracted as the validation set $D_{k}^{\text{val}}$ used for training the linear regression model. while the remaining $\frac{v-1}{v}$ are used as the training set $D_{k}^{\text{train}}$ .

III-B Self-Centered Attack Execution

As shown in1, for each malicious client at round $t$ , the malicious model $\theta_{m}^{t}$ is trained independently on $D_{k}^{\text{train}}$ , and used solely for generating poisoned gradients.

The gradient of the malicious model is computed as:

[TABLE]

A perturbation is applied to the malicious gradient to generate the adversarial gradient:

[TABLE]

where $\mathcal{A}(\cdot)$ denotes the attack function, $\delta$ is the adversarial direction, and $\beta\in\mathbb{R}^{+}$ is its magnitude.

The perturbed gradient $\tilde{g}_{m}^{t}$ is uploaded to the server. Upon collecting updates from all clients, the server computes the global gradient:

[TABLE]

III-C Model-Specific Local Update

After receiving the global gradient $g^{t}$ , each malicious client subsequently updates its local models accordingly.

Malicious model update

The malicious model is synchronized with the global direction:

[TABLE]

Private model update

Each malicious client computes the gradient for its private model using its local data:

[TABLE]

The malicious clients in $\mathcal{K}_{m}$ share their private gradients and perform collaborative local update:

[TABLE]

Error model update

The error model explicitly simulates global model degradation by being updated with the poisoned gradient.

[TABLE]

III-D Divergence-Aware Ensemble Optimization

To effectively exploit the complementary characteristics of diverse internal model predictions, each malicious client constructs a lightweight ensemble classifier designed to aggregate multiple sources of locally derived knowledge. This divergence-aware strategy strengthens the client’s ability to generalize and adapt to malicious objectives without directly compromising individual model integrity, as shown in2.

Let $z_{m}$ , $z_{p}$ , and $z_{e}$ denote the predicted logits corresponding to the malicious model $f(\theta_{m}^{t})$ , the private task model $f(\theta_{p}^{t})$ , and an optional error modeling head $f(\theta_{e}^{t})$ , respectively, all evaluated over the client’s local validation dataset $\mathbf{x}\in D_{k}^{\text{val}}$ . Based on these predictions, a local ensemble classifier $\mathcal{E}_{k}^{t}$ is trained to combine the model outputs into a unified decision-making function. Specifically, this ensemble is realized via multinomial logistic regression, which facilitates a learnable convex combination over the logit space.

The ensemble model is trained via multinomial logistic regression to minimize the cross-entropy loss between predictions and ground-truth labels, with L2 regularization to mitigate overfitting. The training is performed using L-BFGS until convergence. The optimization problem is formally defined as:

[TABLE]

where $\mathcal{L}(\cdot)$ denotes the regularized multi-class cross-entropy loss. This framework allows each malicious participant to extract synergistic utility from distinct prediction modes—capturing both adversarial trends and task-aligned representations—even in the presence of data heterogeneity or compromised model updates.

Upon completion of local ensemble optimization, each malicious client shares their logistic head $\mathcal{E}_{k}^{t}$ with the rest of the malicious cohort. A centralized fusion process is then employed to aggregate these models into a unified global ensemble classifier $\mathcal{L}^{t+1}$ , which acts as a consensus soft teacher. This fusion is realized through a simple yet effective model averaging scheme:

[TABLE]

where $\mathcal{K}_{m}$ denotes the set of all malicious clients. The resulting classifier $\mathcal{L}^{t+1}$ is utilized as a soft-label generator for subsequent knowledge transfer into the client’s private model, forming a crucial part of the targeted adaptation pipeline.

To preserve task-specific performance on authorized downstream tasks, each malicious client utilizes locally labeled data to provide a strong supervised learning signal. The traditional cross-entropy loss is computed between the model’s predictions and the true class labels $y\in\{1,\dots,C\}$ , where $C$ denotes the number of classes:

[TABLE]

where $f(\theta_{p}^{t+1})$ denotes the softmax output of the updated private model on input $\mathbf{x}$ . This loss component ensures that the model remains well aligned with its primary utility task and effectively prevents performance deviation due to the integration of adversarial signals.

To incorporate the additional ensemble-based knowledge into the private model, we leverage knowledge distillation. Specifically, we compute the Kullback-Leibler (KL) divergence between the soft predictions from the global ensemble head $z_{\text{ensemble}}$ , and those generated by the private model $z_{p}$ . The distillation loss is expressed as:

[TABLE]

where $P=\sigma(z_{\text{ensemble}})$ and $Q=\sigma(z_{p})$ represent the softmax-normalized logit outputs of the ensemble head and private model, respectively. This mechanism facilitates the alignment of the private model’s predictive distribution with the smoothed and consensus-informed distribution offered by the ensemble teacher.

To harness the benefits of both direct supervision and soft-label guidance, the training of the private model is driven by a composite loss function that balances the supervised learning signal and distilled ensemble knowledge. The total objective is formulated as:

[TABLE]

where $\lambda\in[0,1]$ is a balancing coefficient that controls the trade-off between standard supervised learning and cross-distribution alignment. The optimization step for updating the private model parameters $\theta_{p}$ is then performed using standard stochastic gradient descent:

[TABLE]

where $\eta$ denotes the learning rate. This comprehensive update strategy enables the private model to steadily evolve across communication rounds, integrating both adversarially-informed ensemble knowledge and task-consistent labeled data to concurrently pursue high downstream performance and malicious objectives.

In summary, the proposed divergence-aware ensemble optimization serves as a critical component for harmonizing local learning dynamics and external knowledge transfer, thereby allowing malicious clients to discreetly exploit global model updates while steadily improving private task utility.

IV Experimental Analysis

IV-A Experimental Setup

IV-A1 Datasets and Model Architectures

To comprehensively evaluate the effectiveness of the proposed FedThief framework and ensure the validity and comparability of our results, we conduct experiments under standard image classification benchmarks and experimental configurations widely adopted in prior federated learning studies [16, 32, 50]. We select three widely used datasets of varying complexity: MNIST [56], CIFAR-10 [57], and Fashion-MNIST (denoted as FASHION hereafter) [58].

Both MNIST and FASHION consist of ten-class grayscale images with resolution $28\times 28$ , whereas CIFAR-10 contains colored images of size $32\times 32\times 3$ . All datasets are evenly and independently partitioned—i.e., under IID settings—across $50$ clients, ensuring class-balanced distributions without data heterogeneity. This allows us to focus exclusively on malicious behaviors and their direct impact, without the confounding factors introduced by non-IID local data.

For MNIST and FASHION, we implement a lightweight convolutional neural network (CNN) as the client model. It consists of three convolutional layers (kernel size: $3\times 3$ ), each followed by ReLU activation, a $2\times 2$ max-pooling layer, and three fully connected layers. For CIFAR-10, the client model is based on a variant of AlexNet [57], which integrates five convolutional layers, batch normalization, ReLU activations, and max-pooling, followed by dense layers.

All models are locally initialized and trained on clients. Updates are aggregated using standard federated optimization protocols described in Section III.

IV-A2 Attack Method Settings

In practical federated learning systems, malicious participants typically lack access to the precise aggregation algorithms or gradients shared by benign clients. To simulate such realistic threat models, we evaluate FedThief under three prominent Byzantine attack strategies that require neither knowledge of benign updates nor server aggregation rules:

•

LIE [49] (Little is Enough): Injects a small but directionally consistent perturbation to scale malicious gradients.

•

MinSum [32]: Crafts updates to minimize the sum of distances to benign gradients.

•

FedGhost [50]: A query-free, decision-level attack using synthetic outputs instead of raw gradients.

To characterize the relationship between malicious participation and model resilience, we evaluate each method under two adversarial client ratios: $\alpha=20\%$ and $\alpha=40\%$ .

IV-A3 Defense Method Settings

To benchmark the effectiveness of FedThief against existing learning paradigms and defenses, we compare our method with a variety of widely adopted FL baselines:

•

FedAvg [14]: The original and standard federated averaging algorithm used as a baseline.

•

FedProx [59]: A variant of FedAvg that introduces a proximal term to handle client heterogeneity.

•

Median, Trimmed-Mean [53]: Robust aggregation rules based on coordinate-wise statistics.

•

Multi-Krum [51] and Bulyan [54]: Byzantine-resilient methods that leverage geometric filtering to remove suspicious updates before aggregation.

IV-A4 Evaluation Metrics

Following the SCFL design, we independently evaluate both system-wide model performance and malicious gains. Specifically, we report:

•

Global Model Accuracy ( $Acc_{g}$ ): Classification accuracy of the aggregated global model in the final communication round, evaluated on a central held-out test set.

•

Malicious Ensemble Accuracy ( $Acc_{e}$ ): Classification accuracy of the ensemble model locally constructed and used by malicious clients.

To quantify the stealthy efficacy of FedThief, we define a malicious advantage metric:

[TABLE]

which captures the extent to which malicious clients outperform the shared global model. For reference, we also report $\widetilde{Acc}_{g}$ , the global model’s nominal baseline accuracy under clean, attack-free settings.

IV-A5 Implementation Details

We configure the system with $50$ clients unless otherwise noted. Each client holds an equal partition of data, strictly under IID conditions. Federated training is performed over $40$ communication rounds, with default hyperparameters set as follows: learning rate $\eta=0.001$ , batch size = 256, and the Adam optimizer.

For local training, each client performs $2$ local epochs per global round on MNIST and FASHION, and $4$ epochs on CIFAR-10. For knowledge distillation in FedThief, we set the temperature hyperparameter $\tau=3.0$ , and the loss combination coefficient $\lambda=0.5$ . All experiments are implemented using PyTorch, and conducted on NVIDIA RTX-3090 GPUs.

IV-B Comparison Experiments

IV-B1 Global vs. Adversarial Local Performance

To measure the impact and utility of FedThief, we compare the accuracy of the global model with that of the malicious clients’ ensemble model across a variety of attack-defense scenarios, as summarized in Table II.

Across practically all configurations, results reveal that the malicious clients consistently gain access to models with better accuracy than the shared global model. This advantage confirms the success of the SCFL paradigm introduced by FedThief. Importantly, this benefit is achieved without reducing the poisoning strength of the underlying attack. That is, the global model still suffers substantial performance degradation due to malicious behavior, yet malicious clients themselves retain high model utility.

Further, the severity of global model degradation strongly correlates with the proportion of adversarial clients. For instance, when $\alpha=0.4$ , malicious clients are capable of reducing global accuracy by more than 20%, while maintaining their personal performance levels through ensemble optimization. This demonstrates the asymmetric nature of adversarial benefits facilitated by FedThief.

To better visualize the performance gap over time, we plot the training progress under the LIE attack on CIFAR-10 in Fig. 3. The figure clearly illustrates that malicious clients achieve higher accuracy from the early training stages, and this advantage persists throughout all communication rounds and across different defensive aggregation strategies.

Interestingly, this adversarial advantage is more pronounced in complex datasets (e.g., CIFAR-10 or FASHION-MNIST), while relatively tame in simpler ones like MNIST. This is likely due to the lower representation capacity needed to converge on MNIST, making its models less sensitive to gradient manipulations and knowledge expansion via distillation. In contrast, more complex tasks benefit substantially from ensemble-guided optimization, providing malicious clients with a tangible private-model boost.

Generalization to Data Poisoning Attacks. While most experimental evaluations focus on model poisoning through manipulated gradients, we further test the generality of FedThief under label-level data poisoning scenarios. Specifically, we conduct attacks using two representative label-flipping strategies: Symmetry Flip, where all class labels are cyclically permuted; and Pairwise Flip, where each label is flipped to its adjacent class. These attacks differ fundamentally from gradient-based Byzantine behavior, as they alter training labels rather than update directions.

As demonstrated in Table III, FedThief remains effective across both attack types. In all datasets tested (MNIST, FASHION, CIFAR-10), the ensemble models trained by malicious clients outperform the global model, even though the training process introduces significant label noise. The advantage is most prominent on CIFAR-10, where data complexity provides greater benefit to ensemble-guided optimization. For example, under the Pair Flip attack on MNIST, malicious clients achieve 98.06% accuracy with FedThief, compared to only 51.50% by the corrupted global model.

This experimental evidence suggests that FedThief is agnostic to the specific form of the attack vector—whether in parameter space or label domain—and highly capable of extending to a broader class of adversarial strategies beyond model poisoning. The inherent resilience and dynamic adaptability of the ensemble approach underlie its broader applicability in practical federated environments where attack methodologies may evolve or diversify.

IV-B2 Client Utility Spectrum

To rigorously characterize the differential impacts of FedThief, we propose a framework that establishes two fundamental performance limits. The theoretical lower bound is defined by $Acc_{\text{local}}$ , which captures the baseline model accuracy achievable through completely isolated local training without any federated knowledge transfer. The theoretical upper bound is given by $\widetilde{Acc}_{g}$ , representing the maximum attainable accuracy of a global model that remains uncompromised by any adversarial attacks, as FedThief’s knowledge acquisition does not exceed what would be obtained through normal federation.

Given that clients are unaware of the server’s aggregation rule or the global attack state, we compute the expected accuracy each client type (benign or malicious) would perceive as the average across all evaluated aggregation schemes. As illustrated in Fig. 4, this analysis shows that malicious clients consistently exceed both $Acc_{\text{local}}$ and collaborative expectations, highlighting the personal gain facilitated by FedThief. Conversely, benign clients often fall below their local baselines, indicating that their collaboration in a poisoned federation may actively harm their personal training outcome.

These findings expose a critical fairness breach in federated learning: malicious clients can strategically exploit collaborativity, significantly outperforming honest users, and undermining the equality of benefits in the system.

IV-C Ablation Study

We perform an extensive ablation study to rigorously analyze the individual contributions of FedThief’s core components and systematically evaluate its critical design parameters.

IV-C1 Preservation of Attack Behavior

A central design principle in FedThief is that it should not interfere with attack behavior. To confirm this property, we compare the global accuracy $Acc_{g}$ under traditional Byzantine attacks (e.g., LIE) with and without FedThief. Since only the malicious model $\theta_{m}$ is used when generating gradient updates, FedThief does not alter the attack logic. As confirmed in Table II, attack degradation on the global model is identical, validating that FedThief preserves the underlying attack strength while adding private-model optimization.

IV-C2 Effectiveness of Ensemble Optimization

Ensemble optimization in FedThief is driven by a weighted loss function incorporating two terms: (i) cross-entropy on the private dataset, and (ii) KL divergence from ensemble-generated soft targets. The trade-off between these is controlled via a weight $\lambda$ . We investigate the impact of varying the regularization parameter $\lambda\in\{0,0.25,0.5,0.75,1.0\}$ on model performance through extensive experiments conducted on CIFAR-10.

As plotted in Fig. 5, using only cross-entropy loss ( $\lambda=1$ ) results in limited generalization, especially for clients with small or biased data. Lower values of $\lambda$ allow the ensemble to guide training, improving both local and ensemble model accuracy. However, too small a $\lambda$ distorts the optimization process—leading to convergence toward the global model, losing private uniqueness. The optimal model performance is observed at $\lambda=0.5$ , demonstrating that an intermediate trade-off between the competing objectives yields favorable results.

IV-C3 Impact of Validation Partition Ratio

Parameter $v$ controls how the private dataset is split between training the malicious model and validating the ensemble. A lower $v$ allocates more data to validation (benefiting the ensemble), while larger $v$ favors attack model training.

Table IV presents performance results under various $v$ values. A moderate split ratio of $v=5$ yields the best balance, enabling both high-quality ensemble optimization and sufficient attack gradient calculation. Too aggressive splitting ( $v=10$ ) weakens the ensemble’s generalization capacity, whereas too little validation ( $v=2$ ) hampers adversarial manipulation strength.

IV-C4 Effectiveness of Ensemble Components

To better understand the internal mechanism of FedThief’s ensemble-guided optimization, we conduct a detailed component-wise analysis on the ensemble structure. Specifically, we investigate the individual and joint contributions of three distinct model sources—namely, the private model $\theta_{p}$ trained on the adversarial client’s local dataset, the malicious model $\theta_{m}$ synchronized with the global model for gradient poisoning, and the error model $\theta_{e}$ which approximates the residual discrepancy between $\theta_{m}$ and the global aggregation output.

Each of these models encodes different inductive biases and information flows during the adversarial training process. $\theta_{p}$ captures client-specific patterns, biases, and unique local semantic distributions. $\theta_{m}$ reflects the poisoned global trajectory, implicitly embedding the impact of attack-induced updates across communication rounds. $\theta_{e}$ serves as a corrective signal that captures counterfactual behavior—representing the divergence introduced through adversarial manipulation.

Table V reports the ensemble model accuracy under various inclusion settings across three benchmark datasets: MNIST, FASHION, and CIFAR-10. When only $\theta_{p}$ is used, the ensemble corresponds to the naive baseline relying solely on private data. Adding $\theta_{m}$ to the ensemble consistently improves accuracy, suggesting that tracking global model dynamics—even in poisoned form—provides additional and useful guidance during model distillation. Interestingly, including $\theta_{e}$ alone with $\theta_{p}$ yields similar improvements, particularly on more complex datasets such as FASHION or CIFAR-10. This empirical result implies that modeling error residuals introduces high-frequency correction signals that can enhance generalization beyond client-local or global knowledge alone.

The full ensemble, incorporating all three model sources, achieves the highest accuracy across all datasets. The most notable gains are observed on CIFAR-10, where the inclusion of $\theta_{m}$ and $\theta_{e}$ contributes 2.5% improvement over the private-only ensemble. These results indicate that the full ensemble benefits from a multi-view knowledge composition, combining local specificity, global drift, and adversarial perturbation effects into a unified predictive output. Such diversity appears essential for optimal private optimization in the presence of adversarial gradients and federated heterogeneity.

This component-wise ablation confirms that model composition is critical in harnessing the full power of ensemble-based knowledge distillation under adversarial settings. It further strongly supports the core design intuition that combining orthogonal information sources synergistically enhances both robustness and transferability in SCFL.

IV-C5 Effect of Temperature in Distillation

In knowledge distillation, the temperature hyperparameter $T$ plays a critical role in shaping the softness of the teacher model’s output distribution. Specifically, a higher temperature flattens the softmax probabilities, exposing the dark knowledge embedded in less confident class predictions, while a lower temperature sharpens the distribution to resemble one-hot targets. This spectrum of logit softening directly influences how effectively the student model—in this case, the privately distilled model on each malicious client—can align with the ensemble teacher.

To explore this effect within the FedThief framework, we conduct a distillation temperature ablation study on CIFAR-10 using the MinSum attack and FedAvg aggregation, fixing $\alpha=0.2$ . The tested range includes $T\in\{1,2,3,5\}$ , where $T=1$ corresponds to standard softmax and higher values represent increasingly softer logits.

As presented in Table VI, increasing the temperature from 1 to 3 improves ensemble-model accuracy by over 2%, demonstrating the benefits of softened supervision for stabilizing optimization and enhancing generalization under non-identical logit sources. At $T=3$ , the ensemble achieves its peak performance, validating this value as an empirically optimal point for knowledge transfer in our SCFL scenario. Notably, further increasing the temperature to $T=5$ does not result in continued improvement, indicating a saturation effect. When the output becomes overly uniform, the signal-to-noise ratio in the soft targets may degrade, weakening the gradient alignment between teacher and student models.

These findings align with prior distillation literature, which suggests that moderate temperature scaling helps reveal richer inter-class relationships while maintaining discriminability. In our context, it also supports better cross-model mimicry, allowing the malicious client to integrate private, poisoned, and global logics effectively into its local model. Thus, tuning the temperature hyperparameter is essential for unlocking the full potential of ensemble-guided optimization in adversarial federated learning.

V Conclusion

This work presents a novel federated learning attack paradigm termed Self-Centered Federated Learning (SCFL), wherein malicious clients can simultaneously degrade the global model’s performance while improving their personal utility. We introduce FedThief, an instantiation of SCFL, which strategically separates global manipulation and local optimization via ensemble-guided training and adaptive model fusion. The framework leverages knowledge distillation and meta-predictive integration to jointly achieve global disruption and private benefit. Extensive experimental evaluations across diverse settings demonstrate the effectiveness, adaptability, and resilience of the proposed approach under varying levels of data heterogeneity and adversarial participation.

Discussion and Future Directions. While empirically effective, the proposed FedThief framework also presents several practical challenges. First, its ensemble-based training introduces non-trivial computational and communication overhead for malicious clients. Although this is feasible for resourceful adversaries, reducing the cost while preserving attack efficacy remains an open issue. Second, the effectiveness of SCFL may be influenced by the availability and quality of private data on malicious clients. Since ensemble alignment and local distillation rely on locally-held samples, data that is highly imbalanced, sparse, or non-representative—particularly under non-IID distributions—could reduce knowledge transfer efficiency and slightly diminish the private utility gained. Nonetheless, in practical scenarios, malicious clients may still possess sufficient data diversity to sustain meaningful benefit. To expand applicability, future work could explore methods such as data synthesis or adversarial augmentation to mitigate the impact of limited or biased data.

Future research may explore lightweight attack formulations with reduced model complexity and fewer communication rounds. Moreover, developing adaptive mechanisms—such as self-supervised objectives or on-device data augmentation—to mitigate data scarcity could further enhance robustness under realistic deployment scenarios. From a defensive viewpoint, this work highlights the need for aggregation strategies resilient to utility-driven adversaries. Advancing secure and adaptive aggregation schemes that can detect and counter such stealthy behaviors remains an important direction for building trustworthy federated systems.

Bibliography59

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] L. Zhou, S. Pan, J. Wang, and A. V. Vasilakos, “Machine learning on big data: Opportunities and challenges,” Neurocomputing , vol. 237, pp. 350–361, 2017.
2[2] X. Zhou, Q. Yang, X. Zheng, W. Liang, I. Kevin, K. Wang, J. Ma, Y. Pan, and Q. Jin, “Personalized federated learning with model-contrastive learning for multi-modal user modeling in human-centric metaverse,” IEEE Journal on Selected Areas in Communications , vol. 42, no. 4, pp. 817–831, 2024.
3[3] X. Rong, J. Zhang, K. He, and M. Ye, “Can: Leveraging clients as navigators for generative replay in federated continual learning,” in Forty-second International Conference on Machine Learning .
4[4] J.-W. Li, W.-Z. Shao, Y.-B. Sun, L.-Q. Wang, Q. Ge, and L. Xiao, “Boosting adversarial transferability via relative feature importance-aware attacks,” IEEE Transactions on Information Forensics and Security , 2025.
5[5] P. Lameski, A. Dimitrievski, E. Zdravevski, V. Trajkovik, and S. Koceski, “Challenges in data collection in real-world environments for activity recognition,” in IEEE EUROCON 2019-18th International Conference on Smart Technologies . IEEE, 2019, pp. 1–5.
6[6] Y. Bai, J. Wang, M. Cao, C. Chen, Z. Cao, L. Nie, and M. Zhang, “Text-based person search without parallel image-text data,” in Proceedings of the 31st ACM International Conference on Multimedia , 2023, pp. 757–767.
7[7] M. M. Rashid, Y. Xiang, M. P. Uddin, J. Tang, K. Sood, and L. Gao, “Trustworthy and fair federated learning via reputation-based consensus and adaptive incentives,” IEEE Transactions on Information Forensics and Security , 2025.
8[8] C. Wu, F. Wu, L. Lyu, Y. Huang, and X. Xie, “Communication-efficient federated learning via knowledge distillation,” Nature communications , vol. 13, no. 1, p. 2032, 2022.