Safe-FedLLM: Delving into the Safety of Federated Large Language Models

Mingxiang Tao; Yu Tian; Wenxuan Tu; Yue Yang; Xue Yang; and Xiangyan Tang

arXiv:2601.07177·cs.CR·April 21, 2026

Safe-FedLLM: Delving into the Safety of Federated Large Language Models

Mingxiang Tao, Yu Tian, Wenxuan Tu, Yue Yang, Xue Yang, and Xiangyan Tang

PDF

TL;DR

This paper investigates security vulnerabilities in federated large language models and introduces Safe-FedLLM, a lightweight probe-based defense framework that effectively detects malicious clients without hindering training efficiency.

Contribution

The paper presents a novel defense framework, Safe-FedLLM, that uses probe-based discrimination of LoRA updates to enhance security in federated LLM training.

Findings

01

Safe-FedLLM effectively detects malicious clients with high accuracy.

02

The framework maintains training speed and performance on benign data.

03

It remains effective even with high ratios of malicious clients.

Abstract

Federated learning (FL) addresses privacy and data-silo issues in the training of large language models (LLMs). Most prior work focuses on improving the efficiency of federated learning for LLMs (FedLLM). However, security in open federated environments, particularly defenses against malicious clients, remains underexplored. To investigate the security of FedLLM, we conduct a preliminary study to analyze potential attack surfaces and defensive characteristics from the perspective of LoRA updates. We find two key properties of FedLLM: 1) LLMs are vulnerable to attacks from malicious clients in FL, and 2) LoRA updates exhibit distinct behavioral patterns that can be effectively distinguished by lightweight classifiers. Based on these properties, we propose Safe-FedLLM, a probe-based defense framework for FedLLM, which constructs defenses across three levels: Step-Level, Client-Level, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.