Responsible Federated LLMs via Safety Filtering and Constitutional AI

Eunchung Noh; Jeonghun Baek

arXiv:2502.16691·cs.CL·May 19, 2026

Responsible Federated LLMs via Safety Filtering and Constitutional AI

Eunchung Noh, Jeonghun Baek

PDF

TL;DR

This paper introduces safety filtering and constitutional AI techniques into federated learning for large language models, significantly enhancing their safety and trustworthiness.

Contribution

It is the first to integrate responsible AI methods into federated LLM training, addressing safety concerns in decentralized data environments.

Findings

01

Safety filtering and constitutional AI improve LLM safety by over 20% on AdvBench.

02

The methods effectively reduce unsafe and inappropriate responses in federated LLMs.

03

The approach enhances trustworthiness without compromising model performance.

Abstract

Recent research has increasingly focused on training large language models (LLMs) using federated learning, known as FedLLM. However, responsible AI (RAI), which aims to ensure safe and trustworthy responses, remains underexplored in this context. In FedLLM, client-side training data may contain harmful content, resulting in unsafe LLMs that can generate inappropriate responses. Aggregating such models into a global model and redistributing it to clients risks the widespread deployment of unsafe LLMs. To address this, we incorporate two well-established RAI techniques into FedLLM: safety filtering and constitutional AI. Our experiments show that these methods significantly improve LLM safety, achieving over 20% improvement on AdvBench.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEuropean Criminal Justice and Data Protection · Privacy-Preserving Technologies in Data · Artificial Intelligence in Law