FedEAT: A Robustness Optimization Framework for Federated LLMs

Yahao Pang; Xingyuan Wu; Xiaojin Zhang; Wei Chen; Hai Jin

arXiv:2502.11863·cs.LG·February 18, 2025

FedEAT: A Robustness Optimization Framework for Federated LLMs

Yahao Pang, Xingyuan Wu, Xiaojin Zhang, Wei Chen, Hai Jin

PDF

Open Access 3 Reviews

TL;DR

FedEAT is a novel framework that enhances the robustness of federated large language models by applying adversarial training in embedding space and using robust aggregation, addressing data heterogeneity and adversarial threats.

Contribution

Introduces FedEAT, a new robustness optimization framework for federated LLMs combining adversarial training and geometric median aggregation.

Findings

01

FedEAT improves robustness against adversarial attacks.

02

Minimal performance loss observed with FedEAT.

03

Effective in handling data heterogeneity.

Abstract

Significant advancements have been made by Large Language Models (LLMs) in the domains of natural language understanding and automated content creation. However, they still face persistent problems, including substantial computational costs and inadequate availability of training data. The combination of Federated Learning (FL) and LLMs (federated LLMs) offers a solution by leveraging distributed data while protecting privacy, which positions it as an ideal choice for sensitive domains. However, Federated LLMs still suffer from robustness challenges, including data heterogeneity, malicious clients, and adversarial attacks, which greatly hinder their applications. We first introduce the robustness problems in federated LLMs, to address these challenges, we propose FedEAT (Federated Embedding space Adversarial Training), a novel framework that applies adversarial training in the embedding…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 4

Strengths

1) The paper addresses adversarial robustness at inference time for federated LLMs in federated setting environments where computational and communication limitations prohibit standard multi-step, full-parameter adversarial training approaches. 2) The evaluation provides broad empirical coverage, spanning multiple model architectures, diverse task domains, and various attack types ie: black box/white box 3) The proposed method is both conceptually straightforward and practically implementable.

Weaknesses

The way the experiment is conducted is not completely correct because of several reasons: - The experiments run for just one epoch, which is not really federated in the truest sense, especially if the non-IID setup is created through simple sample distributions across clients. The experiments then just become very similar to the centralized setting. Since the authors dont even have one ablation of multi epoch training the problem I am not sure if any conclusions can be made. - Second, the

Reviewer 02Rating 4Confidence 4

Strengths

The proposed method is clearly illustrated through a well-structured flow diagram, and the experimental evaluation covers a broad range of datasets. The ablation experiments are also relatively comprehensive and provide useful insights.

Weaknesses

1. The paper claims to address two specific challenges—compute and memory efficiency, and distributed robustness under communication bottlenecks. However, the proposed method does not provide a detailed explanation of how each of these challenges is explicitly handled. In addition, the experiments do not clearly demonstrate how the proposed approach mitigates these issues. It remains unclear whether the main novelty lies solely in the introduction of a regularization term, as the adversarial sam

Reviewer 03Rating 4Confidence 3

Strengths

- The paper tackles an important and underexplored problem: adversarial robustness in federated large language models. - The embedding-space adversarial training idea is a logical and computationally efficient adaptation of prior adversarial training strategies to the federated context. - Empirical results are broad, covering several model architectures and domains. - The algorithm is clearly formulated, and the code availability supports reproducibility.

Weaknesses

1. Limited Novelty - The proposed method is a straightforward combination of known techniques, primarily adversarial training in embedding space (Xhonneux et al., 2024) and LoRA-based federated fine-tuning. - There is no fundamentally new optimization principle or theoretical advancement; the contribution is mainly empirical. 2. Lack of Theoretical Foundation - The work lacks formal analysis of robustness guarantees, convergence properties, or communication complexity. - Claims of efficiency a

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Rights Management and Security · Scheduling and Optimization Algorithms